[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause
[ https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716215 ] ASF GitHub Bot logged work on HIVE-21100: - Author: ASF GitHub Bot Created on: 27/Jan/22 06:43 Start Date: 27/Jan/22 06:43 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2921: URL: https://github.com/apache/hive/pull/2921#discussion_r793293292 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ## @@ -97,6 +101,40 @@ public MoveTask() { super(); } + public void flattenUnionSubdirectories(Path sourcePath) throws HiveException { +try { + FileSystem fs = sourcePath.getFileSystem(conf); + LOG.info("Checking " + sourcePath + " for subdirectories to flatten"); + Set unionSubdirs = new HashSet<>(); + if (fs.exists(sourcePath)) { +RemoteIterator i = fs.listFiles(sourcePath, true); +String prefix = AbstractFileMergeOperator.UNION_SUDBIR_PREFIX; +while (i.hasNext()) { + Path path = i.next().getPath(); + Path parent = path.getParent(); + if (parent.getName().startsWith(prefix)) { +// We do rename by including the name of parent directory into the filename so that there are no clashes +// when we move the files to the parent directory. Ex. HIVE_UNION_SUBDIR_1/00_0 -> 1_00_0 +String parentOfParent = parent.getParent().toString(); +String parentNameSuffix = parent.getName().substring(prefix.length()); + +fs.rename(path, new Path(parentOfParent + "/" + parentNameSuffix + "_" + path.getName())); Review comment: What happens if we already has this filename used? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 716215) Time Spent: 2h (was: 1h 50m) > Allow flattening of table subdirectories resulted when using TEZ engine and > UNION clause > > > Key: HIVE-21100 > URL: https://issues.apache.org/jira/browse/HIVE-21100 > Project: Hive > Issue Type: Improvement >Reporter: George Pachitariu >Assignee: George Pachitariu >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, > HIVE-21100.3.patch, HIVE-21100.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Right now, when writing data into a table with Tez engine and the clause > UNION ALL is the last step of the query, Hive on Tez will create a > subdirectory for each branch of the UNION ALL. > With this patch the subdirectories are removed, and the files are renamed and > moved to the parent directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause
[ https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716213 ] ASF GitHub Bot logged work on HIVE-21100: - Author: ASF GitHub Bot Created on: 27/Jan/22 06:42 Start Date: 27/Jan/22 06:42 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2921: URL: https://github.com/apache/hive/pull/2921#discussion_r793292863 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ## @@ -97,6 +101,40 @@ public MoveTask() { super(); } + public void flattenUnionSubdirectories(Path sourcePath) throws HiveException { +try { + FileSystem fs = sourcePath.getFileSystem(conf); + LOG.info("Checking " + sourcePath + " for subdirectories to flatten"); + Set unionSubdirs = new HashSet<>(); + if (fs.exists(sourcePath)) { +RemoteIterator i = fs.listFiles(sourcePath, true); Review comment: You have mentioned that for ACID does not need this. Could we avoid these calls when they are not needed? Otherwise we make every query slower -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 716213) Time Spent: 1h 50m (was: 1h 40m) > Allow flattening of table subdirectories resulted when using TEZ engine and > UNION clause > > > Key: HIVE-21100 > URL: https://issues.apache.org/jira/browse/HIVE-21100 > Project: Hive > Issue Type: Improvement >Reporter: George Pachitariu >Assignee: George Pachitariu >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, > HIVE-21100.3.patch, HIVE-21100.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Right now, when writing data into a table with Tez engine and the clause > UNION ALL is the last step of the query, Hive on Tez will create a > subdirectory for each branch of the UNION ALL. > With this patch the subdirectories are removed, and the files are renamed and > moved to the parent directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause
[ https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716209 ] ASF GitHub Bot logged work on HIVE-21100: - Author: ASF GitHub Bot Created on: 27/Jan/22 06:38 Start Date: 27/Jan/22 06:38 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2921: URL: https://github.com/apache/hive/pull/2921#discussion_r793291201 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ## @@ -97,6 +101,40 @@ public MoveTask() { super(); } + public void flattenUnionSubdirectories(Path sourcePath) throws HiveException { +try { + FileSystem fs = sourcePath.getFileSystem(conf); + LOG.info("Checking " + sourcePath + " for subdirectories to flatten"); + Set unionSubdirs = new HashSet<>(); + if (fs.exists(sourcePath)) { Review comment: This is a costly call. We have a same result with catching the relevant exception in `listFiles`, and we can save one FileSystem.exists call. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 716209) Time Spent: 1h 40m (was: 1.5h) > Allow flattening of table subdirectories resulted when using TEZ engine and > UNION clause > > > Key: HIVE-21100 > URL: https://issues.apache.org/jira/browse/HIVE-21100 > Project: Hive > Issue Type: Improvement >Reporter: George Pachitariu >Assignee: George Pachitariu >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, > HIVE-21100.3.patch, HIVE-21100.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Right now, when writing data into a table with Tez engine and the clause > UNION ALL is the last step of the query, Hive on Tez will create a > subdirectory for each branch of the UNION ALL. > With this patch the subdirectories are removed, and the files are renamed and > moved to the parent directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25903) Upgrade Joda time version
[ https://issues.apache.org/jira/browse/HIVE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatasubrahmanian Narayanan updated HIVE-25903: - Attachment: HIVE-25903.patch > Upgrade Joda time version > - > > Key: HIVE-25903 > URL: https://issues.apache.org/jira/browse/HIVE-25903 > Project: Hive > Issue Type: Improvement >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Minor > Attachments: HIVE-25903.patch > > > Hive uses an older version of Joda time, which can cause issues with some > workflows. Switching over to the latest version resolves the issue from > Hive's end. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25903) Upgrade Joda time version
[ https://issues.apache.org/jira/browse/HIVE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatasubrahmanian Narayanan reassigned HIVE-25903: > Upgrade Joda time version > - > > Key: HIVE-25903 > URL: https://issues.apache.org/jira/browse/HIVE-25903 > Project: Hive > Issue Type: Improvement >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Minor > > Hive uses an older version of Joda time, which can cause issues with some > workflows. Switching over to the latest version resolves the issue from > Hive's end. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25893) NPE when reading Parquet data because ColumnVector isNull[] is not updated
[ https://issues.apache.org/jira/browse/HIVE-25893?focusedWorklogId=715996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715996 ] ASF GitHub Bot logged work on HIVE-25893: - Author: ASF GitHub Bot Created on: 26/Jan/22 20:48 Start Date: 26/Jan/22 20:48 Worklog Time Spent: 10m Work Description: soumyakanti3578 commented on a change in pull request #2970: URL: https://github.com/apache/hive/pull/2970#discussion_r793035137 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -500,22 +501,37 @@ private boolean compareDecimalColumnVector(DecimalColumnVector cv1, DecimalColum private boolean compareBytesColumnVector(BytesColumnVector cv1, BytesColumnVector cv2) { int length1 = cv1.vector.length; int length2 = cv2.vector.length; -if (length1 == length2) { - for (int i = 0; i < length1; i++) { -int innerLen1 = cv1.vector[i].length; -int innerLen2 = cv2.vector[i].length; -if (innerLen1 == innerLen2) { - for (int j = 0; j < innerLen1; j++) { -if (cv1.vector[i][j] != cv2.vector[i][j]) { - return false; -} - } -} else { +if (length1 != length2) { + return false; +} + +for (int i = 0; i < length1; i++) { + // check for different nulls + if (columnVectorsDifferNullForSameIndex(cv1, cv2, i)) { +return false; + } + + // if they are both null, continue + // else if one of them is null, return false + if (cv1.isNull[i] && cv2.isNull[i]) { +continue; + } else if (cv1.isNull[i] || cv2.isNull[i]) { Review comment: Thanks @kasakrisz! Removed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715996) Time Spent: 40m (was: 0.5h) > NPE when reading Parquet data because ColumnVector isNull[] is not updated > -- > > Key: HIVE-25893 > URL: https://issues.apache.org/jira/browse/HIVE-25893 > Project: Hive > Issue Type: Bug >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In > [VectorizedListColumnReader.java|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java] > {{isNull[]}} is used in the comparison methods ( eg. > [columnVectorsDifferNullForSameIndex > |https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L524] > ), however, {{isNull}} is always {{false}} as it is never updated in > [getChildData|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L401]. > This could result in NullPointerException like, > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareBytesColumnVector(VectorizedListColumnReader.java:506) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareColumnVector(VectorizedListColumnReader.java:432) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.setIsRepeating(VectorizedListColumnReader.java:367) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:360) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:83) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:438) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:377) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:100) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.
[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715831 ] ASF GitHub Bot logged work on HIVE-25883: - Author: ASF GitHub Bot Created on: 26/Jan/22 17:18 Start Date: 26/Jan/22 17:18 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2971: URL: https://github.com/apache/hive/pull/2971#discussion_r792867605 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -434,8 +437,18 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa return success; } - private boolean hasDataBelowWatermark(FileSystem fs, Path path, long highWatermark) throws IOException { -FileStatus[] children = fs.listStatus(path); + private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, Path path, long highWatermark) + throws IOException { +Set acidPaths = new HashSet<>(); +for (ParsedDelta delta : acidDir.getCurrentDirectories()) { + acidPaths.add(delta.getPath()); +} +if (acidDir.getBaseDirectory() != null) { + acidPaths.add(acidDir.getBaseDirectory()); +} +FileStatus[] children = fs.listStatus(path, p -> { + return !acidPaths.contains(p); +}); for (FileStatus child : children) { if (isFileBelowWatermark(child, highWatermark)) { Review comment: after some thinking I convinced myself that you are right :) * I've changed to return `true` for non-directories ** in the background these should appear as `obsolete` files anyway; so it should not cause any real trouble in the scope of HIVE-25883; but it makes the method live up to its name... * since the latest patch we are checking and excluding all the dirs the actual acid dir is using - so if we have anything below or even at the writeid level that should be considered invalid; the `nothingToCleanAfterAbortsDelta` testcase is a "complicated" case but the default case is similar to this with the new checks. pushed a new commit to update these -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715831) Time Spent: 1h 40m (was: 1.5h) > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause
[ https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=715820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715820 ] ASF GitHub Bot logged work on HIVE-21100: - Author: ASF GitHub Bot Created on: 26/Jan/22 16:53 Start Date: 26/Jan/22 16:53 Worklog Time Spent: 10m Work Description: hsnusonic commented on pull request #2921: URL: https://github.com/apache/hive/pull/2921#issuecomment-1022392409 @pvary After some manual testing, I found this UNION_SUBDIR doesn't exist for ACID tables. It only exist for external table on Tez, so I added qtest to tez. Could you help to review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715820) Time Spent: 1.5h (was: 1h 20m) > Allow flattening of table subdirectories resulted when using TEZ engine and > UNION clause > > > Key: HIVE-21100 > URL: https://issues.apache.org/jira/browse/HIVE-21100 > Project: Hive > Issue Type: Improvement >Reporter: George Pachitariu >Assignee: George Pachitariu >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, > HIVE-21100.3.patch, HIVE-21100.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Right now, when writing data into a table with Tez engine and the clause > UNION ALL is the last step of the query, Hive on Tez will create a > subdirectory for each branch of the UNION ALL. > With this patch the subdirectories are removed, and the files are renamed and > moved to the parent directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25809) Implement URI Mapping for KuduStorageHandler in Hive
[ https://issues.apache.org/jira/browse/HIVE-25809?focusedWorklogId=715789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715789 ] ASF GitHub Bot logged work on HIVE-25809: - Author: ASF GitHub Bot Created on: 26/Jan/22 16:19 Start Date: 26/Jan/22 16:19 Worklog Time Spent: 10m Work Description: saihemanth-cloudera closed pull request #2877: URL: https://github.com/apache/hive/pull/2877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715789) Time Spent: 0.5h (was: 20m) > Implement URI Mapping for KuduStorageHandler in Hive > - > > Key: HIVE-25809 > URL: https://issues.apache.org/jira/browse/HIVE-25809 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Security >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, there is no storage URI mapping for KuduStorageHandler based on > the feature HIVE-24705. The API getURIForAuth() needs to be implemented in > KuduStorageHandler. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25897) Move delta metric collection into AcidMetricsService
[ https://issues.apache.org/jira/browse/HIVE-25897?focusedWorklogId=715737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715737 ] ASF GitHub Bot logged work on HIVE-25897: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:59 Start Date: 26/Jan/22 14:59 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2973: URL: https://github.com/apache/hive/pull/2973#discussion_r792724777 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java ## @@ -1600,7 +1602,9 @@ public static ConfVars getMetaConf(String name) { // Deprecated Hive values that we are keeping for backwards compatibility. @Deprecated HIVE_CODAHALE_METRICS_REPORTER_CLASSES("hive.service.metrics.codahale.reporter.classes", -"hive.service.metrics.codahale.reporter.classes", "", +"hive.service.metrics.codahale.reporter.classes", + "org.apache.hadoop.hive.common.metrics.metrics2.JsonFileMetricsReporter, " + +"org.apache.hadoop.hive.common.metrics.metrics2.JmxMetricsReporter", Review comment: This param is used to initialize the CodahaleMetricsReporter classes. In some unit test, we are using it, and since the unit test were moved from the `hive-common` module to the `standalone-metastore-common` module, I had to add these values to the `MetastoreConf` to keep backward compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715737) Time Spent: 50m (was: 40m) > Move delta metric collection into AcidMetricsService > > > Key: HIVE-25897 > URL: https://issues.apache.org/jira/browse/HIVE-25897 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > DeltaFilesMetricReporter and AcidMetricsService are two different threads > collecting ACID related metrics. It makes sense to merge those threads since > they share the same goal. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25897) Move delta metric collection into AcidMetricsService
[ https://issues.apache.org/jira/browse/HIVE-25897?focusedWorklogId=715734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715734 ] ASF GitHub Bot logged work on HIVE-25897: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:55 Start Date: 26/Jan/22 14:55 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2973: URL: https://github.com/apache/hive/pull/2973#discussion_r792720421 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -429,7 +422,8 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa .map(Path::getName).collect(Collectors.joining(","))); boolean success = remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo); if (dir.getObsolete().size() > 0) { - updateDeltaFilesMetrics(ci.dbname, ci.tableName, ci.partName, dir.getObsolete()); + AcidMetricService.updateMetricsFromCleaner(ci.dbname, ci.tableName, ci.partName, dir.getObsolete(), conf, Review comment: You're right. Fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715734) Time Spent: 40m (was: 0.5h) > Move delta metric collection into AcidMetricsService > > > Key: HIVE-25897 > URL: https://issues.apache.org/jira/browse/HIVE-25897 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > DeltaFilesMetricReporter and AcidMetricsService are two different threads > collecting ACID related metrics. It makes sense to merge those threads since > they share the same goal. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25897) Move delta metric collection into AcidMetricsService
[ https://issues.apache.org/jira/browse/HIVE-25897?focusedWorklogId=715719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715719 ] ASF GitHub Bot logged work on HIVE-25897: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:35 Start Date: 26/Jan/22 14:35 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2973: URL: https://github.com/apache/hive/pull/2973#discussion_r792691109 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java ## @@ -1600,7 +1602,9 @@ public static ConfVars getMetaConf(String name) { // Deprecated Hive values that we are keeping for backwards compatibility. @Deprecated HIVE_CODAHALE_METRICS_REPORTER_CLASSES("hive.service.metrics.codahale.reporter.classes", -"hive.service.metrics.codahale.reporter.classes", "", +"hive.service.metrics.codahale.reporter.classes", + "org.apache.hadoop.hive.common.metrics.metrics2.JsonFileMetricsReporter, " + +"org.apache.hadoop.hive.common.metrics.metrics2.JmxMetricsReporter", Review comment: What does this do? Especially since the config is deprecated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715719) Time Spent: 0.5h (was: 20m) > Move delta metric collection into AcidMetricsService > > > Key: HIVE-25897 > URL: https://issues.apache.org/jira/browse/HIVE-25897 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > DeltaFilesMetricReporter and AcidMetricsService are two different threads > collecting ACID related metrics. It makes sense to merge those threads since > they share the same goal. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715716 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:21 Start Date: 26/Jan/22 14:21 Worklog Time Spent: 10m Work Description: vcsomor commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792685276 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -155,18 +159,28 @@ public void run() { // when min_history_level is finally dropped, than every HMS will commit compaction the new way // and minTxnIdSeenOpen can be removed and minOpenTxnId can be used instead. for (CompactionInfo compactionInfo : readyToClean) { - cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked( - () -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), cleanerExecutor)); + String tableName = compactionInfo.getFullTableName(); + String partition = compactionInfo.getFullPartitionName(); + CompletableFuture asyncJob = + CompletableFuture.runAsync( + ThrowingRunnable.unchecked(() -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), + cleanerExecutor) + .exceptionally(t -> { +cleanerErrors.incrementAndGet(); +LOG.error("Error during the cleaning the table {} / partition {}", tableName, partition, t); +return null; + }); + cleanerList.add(asyncJob); } CompletableFuture.allOf(cleanerList.toArray(new CompletableFuture[0])).join(); + +if (metricsEnabled && handle != null) { Review comment: this is why I left there.. in the same time I've removed from the Initiator where it cannot be null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715716) Time Spent: 1h 40m (was: 1.5h) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715714 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:20 Start Date: 26/Jan/22 14:20 Worklog Time Spent: 10m Work Description: vcsomor commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792684511 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -155,18 +159,28 @@ public void run() { // when min_history_level is finally dropped, than every HMS will commit compaction the new way // and minTxnIdSeenOpen can be removed and minOpenTxnId can be used instead. for (CompactionInfo compactionInfo : readyToClean) { - cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked( - () -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), cleanerExecutor)); + String tableName = compactionInfo.getFullTableName(); + String partition = compactionInfo.getFullPartitionName(); + CompletableFuture asyncJob = + CompletableFuture.runAsync( + ThrowingRunnable.unchecked(() -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), + cleanerExecutor) + .exceptionally(t -> { +cleanerErrors.incrementAndGet(); +LOG.error("Error during the cleaning the table {} / partition {}", tableName, partition, t); +return null; + }); + cleanerList.add(asyncJob); } CompletableFuture.allOf(cleanerList.toArray(new CompletableFuture[0])).join(); + +if (metricsEnabled && handle != null) { Review comment: according to IntelliJ code-path analyzer it might happens to be -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715714) Time Spent: 1.5h (was: 1h 20m) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715711 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:19 Start Date: 26/Jan/22 14:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792682742 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -155,18 +159,28 @@ public void run() { // when min_history_level is finally dropped, than every HMS will commit compaction the new way // and minTxnIdSeenOpen can be removed and minOpenTxnId can be used instead. for (CompactionInfo compactionInfo : readyToClean) { - cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked( - () -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), cleanerExecutor)); + String tableName = compactionInfo.getFullTableName(); + String partition = compactionInfo.getFullPartitionName(); + CompletableFuture asyncJob = + CompletableFuture.runAsync( + ThrowingRunnable.unchecked(() -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), + cleanerExecutor) + .exceptionally(t -> { +cleanerErrors.incrementAndGet(); +LOG.error("Error during the cleaning the table {} / partition {}", tableName, partition, t); +return null; + }); + cleanerList.add(asyncJob); } CompletableFuture.allOf(cleanerList.toArray(new CompletableFuture[0])).join(); + +if (metricsEnabled && handle != null) { + failuresCounter.inc(cleanerErrors.get()); Review comment: disregard -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715711) Time Spent: 1h 20m (was: 1h 10m) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ >
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715709 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:18 Start Date: 26/Jan/22 14:18 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792681772 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -155,18 +159,28 @@ public void run() { // when min_history_level is finally dropped, than every HMS will commit compaction the new way // and minTxnIdSeenOpen can be removed and minOpenTxnId can be used instead. for (CompactionInfo compactionInfo : readyToClean) { - cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked( - () -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), cleanerExecutor)); + String tableName = compactionInfo.getFullTableName(); + String partition = compactionInfo.getFullPartitionName(); + CompletableFuture asyncJob = + CompletableFuture.runAsync( + ThrowingRunnable.unchecked(() -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), + cleanerExecutor) + .exceptionally(t -> { +cleanerErrors.incrementAndGet(); +LOG.error("Error during the cleaning the table {} / partition {}", tableName, partition, t); +return null; + }); + cleanerList.add(asyncJob); } CompletableFuture.allOf(cleanerList.toArray(new CompletableFuture[0])).join(); + +if (metricsEnabled && handle != null) { + failuresCounter.inc(cleanerErrors.get()); Review comment: i don't think this is correct as cleanerErrors is a global variable and would be incremented on every iteration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715709) Time Spent: 1h 10m (was: 1h) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this f
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715708 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 14:15 Start Date: 26/Jan/22 14:15 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792679287 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -155,18 +159,28 @@ public void run() { // when min_history_level is finally dropped, than every HMS will commit compaction the new way // and minTxnIdSeenOpen can be removed and minOpenTxnId can be used instead. for (CompactionInfo compactionInfo : readyToClean) { - cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked( - () -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), cleanerExecutor)); + String tableName = compactionInfo.getFullTableName(); + String partition = compactionInfo.getFullPartitionName(); + CompletableFuture asyncJob = + CompletableFuture.runAsync( + ThrowingRunnable.unchecked(() -> clean(compactionInfo, cleanerWaterMark, metricsEnabled)), + cleanerExecutor) + .exceptionally(t -> { +cleanerErrors.incrementAndGet(); +LOG.error("Error during the cleaning the table {} / partition {}", tableName, partition, t); +return null; + }); + cleanerList.add(asyncJob); } CompletableFuture.allOf(cleanerList.toArray(new CompletableFuture[0])).join(); + +if (metricsEnabled && handle != null) { Review comment: can handle be null here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715708) Time Spent: 1h (was: 50m) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > ret
[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715670&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715670 ] ASF GitHub Bot logged work on HIVE-25883: - Author: ASF GitHub Bot Created on: 26/Jan/22 13:22 Start Date: 26/Jan/22 13:22 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2971: URL: https://github.com/apache/hive/pull/2971#discussion_r792628396 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -434,8 +437,18 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa return success; } - private boolean hasDataBelowWatermark(FileSystem fs, Path path, long highWatermark) throws IOException { -FileStatus[] children = fs.listStatus(path); + private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, Path path, long highWatermark) + throws IOException { +Set acidPaths = new HashSet<>(); +for (ParsedDelta delta : acidDir.getCurrentDirectories()) { + acidPaths.add(delta.getPath()); +} +if (acidDir.getBaseDirectory() != null) { + acidPaths.add(acidDir.getBaseDirectory()); +} +FileStatus[] children = fs.listStatus(path, p -> { + return !acidPaths.contains(p); +}); for (FileStatus child : children) { if (isFileBelowWatermark(child, highWatermark)) { Review comment: 1. > I believe that in case there are files in the dir they already should be in the obsolete list Not necessarily, because the AcidDirectory the Cleaner uses is computed based on an older txnId (cleanerWaterMark), so there is a chance its obsolete list does not contain files that should be cleaned up eventually, which is what this method is supposed to figure out. (Right?) @deniskuzZ please correct me if I'm wrong about this since I know there have been recent changes to this logic 2. I meant that if the table dir contains: - delta_5_5 - delta_1_5_v100 (minor compacted) Then the cleaner should eventually remove delta_5_5, so there will be files to remove later, when the cleanerWaterMark is high enough -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715670) Time Spent: 1.5h (was: 1h 20m) > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted
[ https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Pawar resolved HIVE-24573. - Resolution: Fixed ok > hive 3.1.2 drop table Sometimes it can't be deleted > --- > > Key: HIVE-24573 > URL: https://issues.apache.org/jira/browse/HIVE-24573 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: paul >Assignee: Nisarg Nagrale >Priority: Blocker > > Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be > deleted; hive.log The log shows > 2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] > metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: > hive.dc_usermanage.trade_3_temp448 table not found > > Statement returns success > > I doubt that this problem will only arise under the condition of high-level > merger. We run a lot of tasks every day, one or two tasks every day, which > will happen > > metastore mysql > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25901) unable to run query
[ https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482457#comment-17482457 ] Pravin Pawar commented on HIVE-25901: - ok > unable to run query > --- > > Key: HIVE-25901 > URL: https://issues.apache.org/jira/browse/HIVE-25901 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-25901) unable to run query
[ https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25901 started by Pravin Pawar. --- > unable to run query > --- > > Key: HIVE-25901 > URL: https://issues.apache.org/jira/browse/HIVE-25901 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25901) unable to run query
[ https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Pawar resolved HIVE-25901. - Release Note: done Resolution: Fixed > unable to run query > --- > > Key: HIVE-25901 > URL: https://issues.apache.org/jira/browse/HIVE-25901 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25901) unable to run query
[ https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Pawar reassigned HIVE-25901: --- Assignee: Pravin Pawar > unable to run query > --- > > Key: HIVE-25901 > URL: https://issues.apache.org/jira/browse/HIVE-25901 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715644 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 12:26 Start Date: 26/Jan/22 12:26 Worklog Time Spent: 10m Work Description: vcsomor commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792585640 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java ## @@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String metric, long startedAt) { long elapsed = System.currentTimeMillis() - startedAt; LOG.debug("Updating {} metric to {}", metric, elapsed); Metrics.getOrCreateGauge(metric) - .set((int)elapsed); + .set((int) elapsed); return elapsed; } return 0; } + + @VisibleForTesting + protected static void waitAllAsyncTask(List> tasks) throws AsyncTaskCompletionException { +List exceptions = new ArrayList<>(); +for (CompletableFuture task : tasks) { + try { +task.join(); Review comment: Or simply an atomic integer would suffice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715644) Time Spent: 50m (was: 40m) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > return reportJoin((r = result) == null ? waitingGet(false) : r); > } > {code} > (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The > {{catch}} block will be initiated and the {{failuresCounter}} will be > incremented. If there is any consecutive error amongst the remaining cleaners > the counter won't be incremented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715641 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 12:25 Start Date: 26/Jan/22 12:25 Worklog Time Spent: 10m Work Description: vcsomor commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792584766 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java ## @@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String metric, long startedAt) { long elapsed = System.currentTimeMillis() - startedAt; LOG.debug("Updating {} metric to {}", metric, elapsed); Metrics.getOrCreateGauge(metric) - .set((int)elapsed); + .set((int) elapsed); return elapsed; } return 0; } + + @VisibleForTesting + protected static void waitAllAsyncTask(List> tasks) throws AsyncTaskCompletionException { +List exceptions = new ArrayList<>(); +for (CompletableFuture task : tasks) { + try { +task.join(); Review comment: Which thread safe list implementation do you prefer in this case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715641) Time Spent: 40m (was: 0.5h) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > return reportJoin((r = result) == null ? waitingGet(false) : r); > } > {code} > (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The > {{catch}} block will be initiated and the {{failuresCounter}} will be > incremented. If there is any consecutive error amongst the remaining cleaners > the counter won't be incremented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715625 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 12:12 Start Date: 26/Jan/22 12:12 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792575492 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java ## @@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String metric, long startedAt) { long elapsed = System.currentTimeMillis() - startedAt; LOG.debug("Updating {} metric to {}", metric, elapsed); Metrics.getOrCreateGauge(metric) - .set((int)elapsed); + .set((int) elapsed); return elapsed; } return 0; } + + @VisibleForTesting + protected static void waitAllAsyncTask(List> tasks) throws AsyncTaskCompletionException { +List exceptions = new ArrayList<>(); +for (CompletableFuture task : tasks) { + try { +task.join(); Review comment: isn't it similar to CompletableFuture.allOf(List cf).join() ? you could do exceptions++ when declaring CompletableFuture by adding .exceptionally(exception -> { collectedExceptions.add(exception); return null; } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715625) Time Spent: 0.5h (was: 20m) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > return reportJoin((r = result) == null ? waitingGet(false) : r); > } > {code} > (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The > {{catch}} block will be initiated and the {{failuresCounter}} will be > incremented. If there is any consecutive error amongst the remaining cleaners > the counter won't be incremented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715623 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 12:11 Start Date: 26/Jan/22 12:11 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2974: URL: https://github.com/apache/hive/pull/2974#discussion_r792575492 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java ## @@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String metric, long startedAt) { long elapsed = System.currentTimeMillis() - startedAt; LOG.debug("Updating {} metric to {}", metric, elapsed); Metrics.getOrCreateGauge(metric) - .set((int)elapsed); + .set((int) elapsed); return elapsed; } return 0; } + + @VisibleForTesting + protected static void waitAllAsyncTask(List> tasks) throws AsyncTaskCompletionException { +List exceptions = new ArrayList<>(); +for (CompletableFuture task : tasks) { + try { +task.join(); Review comment: isn't it the same as CompletableFuture.allOf(List cf).join() ? you could do exceptions++ when declaring CompletableFuture by adding .exceptionally(exception -> { collectedExceptions.add(exception); return null; } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715623) Time Spent: 20m (was: 10m) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > return reportJoin((r = result) == null ? waitingGet(false) : r); > } > {code} > (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The > {{catch}} block will be initiated and the {{failuresCounter}} will be > incremented. If there is any consecutive error amongst the remaining cleaners > the counter won't be incremented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25899) Materialized view registry does not clean dropped views
[ https://issues.apache.org/jira/browse/HIVE-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25899: -- Labels: pull-request-available (was: ) > Materialized view registry does not clean dropped views > --- > > Key: HIVE-25899 > URL: https://issues.apache.org/jira/browse/HIVE-25899 > Project: Hive > Issue Type: Bug > Components: Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CBO plans of materialized views which are enabled for query rewrite are > cached in HS2 (MaterializedViewsCache) > Dropping a materialized views should remove the entry from the cache however > the entry keys are not removed. > Cache state after running a whole PTest split: > {code} > this = {HiveMaterializedViewsRegistry@20858} > materializedViewsCache = {MaterializedViewsCache@20913} > materializedViews = {ConcurrentHashMap@67654} size = 3 >"default" -> {ConcurrentHashMap@28568} size = 8 > key = "default" > value = {ConcurrentHashMap@28568} size = 8 > "cluster_mv_2" -> {HiveRelOptMaterialization@67786} > "cluster_mv_1" -> {HiveRelOptMaterialization@67788} > "cluster_mv_4" -> {HiveRelOptMaterialization@67790} > "cluster_mv_3" -> {HiveRelOptMaterialization@67792} > "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} > "distribute_mv_1" -> {HiveRelOptMaterialization@67796} > "distribute_mv_3" -> {HiveRelOptMaterialization@67798} > "distribute_mv_2" -> {HiveRelOptMaterialization@67800} >"db2" -> {ConcurrentHashMap@67772} size = 2 > key = "db2" > value = {ConcurrentHashMap@67772} size = 2 > "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} > "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} >"count_distinct" -> {ConcurrentHashMap@67774} size = 0 > key = "count_distinct" > value = {ConcurrentHashMap@67774} size = 0 > sqlToMaterializedView = {ConcurrentHashMap@20915} size = 36 >"SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n FROM > `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON > (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE > `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, > `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694} size = 0 > key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n > FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON > (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE > `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, > `cmv_basetable_2_n100`.`c`" > value = {ArrayList@67694} size = 0 >"select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from > `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, > `emps_parquet_n3`.`deptno`" -> {ArrayList@67696} size = 0 > key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from > `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, > `emps_parquet_n3`.`deptno`" > value = {ArrayList@67696} size = 0 >"select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from > `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> > {ArrayList@67698} size = 1 > key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from > `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" > value = {ArrayList@67698} size = 1 >"SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as > `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, > `default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n AND > `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" -> > {ArrayList@67700} size = 1 > key = "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + > 100 as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM > `default`.`src_txn`, `default`.`src_txn_2`\nWHERE `src_txn`.`key` = > `src_txn_2`.`key`\n AND `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) > `cluster_mv_3`" > value = {ArrayList@67700} size = 1 >"SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n FROM > `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON > (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n WHERE > `cmv_basetable_2_n3`.`c` > 10.0" -> {ArrayList@67702} size = 0 > key = "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n FROM > `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON > (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n WHERE > `cmv_basetable_2_n3`.`c` > 10.0" > value = {ArrayLis
[jira] [Work logged] (HIVE-25899) Materialized view registry does not clean dropped views
[ https://issues.apache.org/jira/browse/HIVE-25899?focusedWorklogId=715620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715620 ] ASF GitHub Bot logged work on HIVE-25899: - Author: ASF GitHub Bot Created on: 26/Jan/22 12:09 Start Date: 26/Jan/22 12:09 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #2975: URL: https://github.com/apache/hive/pull/2975 ### What changes were proposed in this pull request? `MaterializedViewsCache` nested maps. ``` somedb -> someview -> Materialization ``` 1. When removing entries from the inner map check whether that map is empty and remove it from the outer map. 2. Add `isEmpty()` method to `HiveMaterializedViewsRegistry` ### Why are the changes needed? See description of jira. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` mvn test -Dtest=TestMaterializedViewsCache -pl ql ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715620) Remaining Estimate: 0h Time Spent: 10m > Materialized view registry does not clean dropped views > --- > > Key: HIVE-25899 > URL: https://issues.apache.org/jira/browse/HIVE-25899 > Project: Hive > Issue Type: Bug > Components: Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > CBO plans of materialized views which are enabled for query rewrite are > cached in HS2 (MaterializedViewsCache) > Dropping a materialized views should remove the entry from the cache however > the entry keys are not removed. > Cache state after running a whole PTest split: > {code} > this = {HiveMaterializedViewsRegistry@20858} > materializedViewsCache = {MaterializedViewsCache@20913} > materializedViews = {ConcurrentHashMap@67654} size = 3 >"default" -> {ConcurrentHashMap@28568} size = 8 > key = "default" > value = {ConcurrentHashMap@28568} size = 8 > "cluster_mv_2" -> {HiveRelOptMaterialization@67786} > "cluster_mv_1" -> {HiveRelOptMaterialization@67788} > "cluster_mv_4" -> {HiveRelOptMaterialization@67790} > "cluster_mv_3" -> {HiveRelOptMaterialization@67792} > "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} > "distribute_mv_1" -> {HiveRelOptMaterialization@67796} > "distribute_mv_3" -> {HiveRelOptMaterialization@67798} > "distribute_mv_2" -> {HiveRelOptMaterialization@67800} >"db2" -> {ConcurrentHashMap@67772} size = 2 > key = "db2" > value = {ConcurrentHashMap@67772} size = 2 > "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} > "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} >"count_distinct" -> {ConcurrentHashMap@67774} size = 0 > key = "count_distinct" > value = {ConcurrentHashMap@67774} size = 0 > sqlToMaterializedView = {ConcurrentHashMap@20915} size = 36 >"SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n FROM > `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON > (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE > `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, > `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694} size = 0 > key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n > FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON > (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE > `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, > `cmv_basetable_2_n100`.`c`" > value = {ArrayList@67694} size = 0 >"select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from > `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, > `emps_parquet_n3`.`deptno`" -> {ArrayList@67696} size = 0 > key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from > `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, > `emps_parquet_n3`.`deptno`" > value = {ArrayList@67696} size = 0 >"select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from > `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> > {ArrayList@67698} size = 1 > key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from > `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3"
[jira] [Assigned] (HIVE-25900) Materialized view registry does not clean non existing views at refresh
[ https://issues.apache.org/jira/browse/HIVE-25900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-25900: - > Materialized view registry does not clean non existing views at refresh > --- > > Key: HIVE-25900 > URL: https://issues.apache.org/jira/browse/HIVE-25900 > Project: Hive > Issue Type: Bug > Components: Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > CBO plans of materialized views which are enabled for query rewrite are > cached in HS2 (MaterializedViewsCache, HiveMaterializedViewsRegistry) > The registry is refreshed periodically from HMS: > {code:java} > set hive.server2.materializedviews.registry.refresh.period=1500s; > {code} > This functionality is required when multiple HS2 instances are used in a > cluster: MV drop operation is served by one of the HS2 instances and the > registry is updated at that time in that instance. However other HS2 > instances still cache the non-existent view and need to be refreshed by the > updater thread. > Currently the updater thread adds new entries, refresh existing ones but does > not remove the outdated entries. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25746: -- Labels: pull-request-available (was: ) > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > return reportJoin((r = result) == null ? waitingGet(false) : r); > } > {code} > (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The > {{catch}} block will be initiated and the {{failuresCounter}} will be > incremented. If there is any consecutive error amongst the remaining cleaners > the counter won't be incremented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly
[ https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715615 ] ASF GitHub Bot logged work on HIVE-25746: - Author: ASF GitHub Bot Created on: 26/Jan/22 11:51 Start Date: 26/Jan/22 11:51 Worklog Time Spent: 10m Work Description: vcsomor opened a new pull request #2974: URL: https://github.com/apache/hive/pull/2974 Fixing compaction_initiator_failure/compaction_cleaner_failure_counter logic in the Initiator and Cleaner. After implementing this fix all the possible failures will be counted not just the first one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715615) Remaining Estimate: 0h Time Spent: 10m > Compaction Failure Counter counted incorrectly > -- > > Key: HIVE-25746 > URL: https://issues.apache.org/jira/browse/HIVE-25746 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Viktor Csomor >Assignee: Viktor Csomor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > The count of the below metrics counted incorrectly upon an exception. > - {{compaction_initator_failure_counter}} > - {{compaction_cleaner_failure_counter}} > Reasoning: > In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} > which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. > The below code-snippet waits all cleaners to complete (Initiators does it > similarly). > {code:java} > try { > > for (CompactionInfo compactionInfo : readyToClean) { > > cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() > -> > clean(compactionInfo, cleanerWaterMark, > metricsEnabled)), cleanerExecutor)); > } > CompletableFuture.allOf(cleanerList.toArray(new > CompletableFuture[0])).join(); > } > } catch (Throwable t) { > // the lock timeout on AUX lock, should be ignored. > if (metricsEnabled && handle != null) { > failuresCounter.inc(); > } > {code} > If the {{CompleteableFututre#join}} throws an Exception then the failure > counter is incremented. > Docs: > {code} > /** > * Returns the result value when complete, or throws an > * (unchecked) exception if completed exceptionally. To better > * conform with the use of common functional forms, if a > * computation involved in the completion of this > * CompletableFuture threw an exception, this method throws an > * (unchecked) {@link CompletionException} with the underlying > * exception as its cause. > * > * @return the result value > * @throws CancellationException if the computation was cancelled > * @throws CompletionException if this future completed > * exceptionally or a completion computation threw an exception > */ > public T join() { > Object r; > return reportJoin((r = result) == null ? waitingGet(false) : r); > } > {code} > (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The > {{catch}} block will be initiated and the {{failuresCounter}} will be > incremented. If there is any consecutive error amongst the remaining cleaners > the counter won't be incremented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25899) Materialized view registry does not clean dropped views
[ https://issues.apache.org/jira/browse/HIVE-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-25899: - > Materialized view registry does not clean dropped views > --- > > Key: HIVE-25899 > URL: https://issues.apache.org/jira/browse/HIVE-25899 > Project: Hive > Issue Type: Bug > Components: Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > CBO plans of materialized views which are enabled for query rewrite are > cached in HS2 (MaterializedViewsCache) > Dropping a materialized views should remove the entry from the cache however > the entry keys are not removed. > Cache state after running a whole PTest split: > {code} > this = {HiveMaterializedViewsRegistry@20858} > materializedViewsCache = {MaterializedViewsCache@20913} > materializedViews = {ConcurrentHashMap@67654} size = 3 >"default" -> {ConcurrentHashMap@28568} size = 8 > key = "default" > value = {ConcurrentHashMap@28568} size = 8 > "cluster_mv_2" -> {HiveRelOptMaterialization@67786} > "cluster_mv_1" -> {HiveRelOptMaterialization@67788} > "cluster_mv_4" -> {HiveRelOptMaterialization@67790} > "cluster_mv_3" -> {HiveRelOptMaterialization@67792} > "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} > "distribute_mv_1" -> {HiveRelOptMaterialization@67796} > "distribute_mv_3" -> {HiveRelOptMaterialization@67798} > "distribute_mv_2" -> {HiveRelOptMaterialization@67800} >"db2" -> {ConcurrentHashMap@67772} size = 2 > key = "db2" > value = {ConcurrentHashMap@67772} size = 2 > "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} > "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} >"count_distinct" -> {ConcurrentHashMap@67774} size = 0 > key = "count_distinct" > value = {ConcurrentHashMap@67774} size = 0 > sqlToMaterializedView = {ConcurrentHashMap@20915} size = 36 >"SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n FROM > `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON > (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE > `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, > `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694} size = 0 > key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n > FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON > (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE > `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, > `cmv_basetable_2_n100`.`c`" > value = {ArrayList@67694} size = 0 >"select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from > `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, > `emps_parquet_n3`.`deptno`" -> {ArrayList@67696} size = 0 > key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from > `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, > `emps_parquet_n3`.`deptno`" > value = {ArrayList@67696} size = 0 >"select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from > `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> > {ArrayList@67698} size = 1 > key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from > `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" > value = {ArrayList@67698} size = 1 >"SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as > `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, > `default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n AND > `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" -> > {ArrayList@67700} size = 1 > key = "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + > 100 as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM > `default`.`src_txn`, `default`.`src_txn_2`\nWHERE `src_txn`.`key` = > `src_txn_2`.`key`\n AND `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) > `cluster_mv_3`" > value = {ArrayList@67700} size = 1 >"SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n FROM > `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON > (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n WHERE > `cmv_basetable_2_n3`.`c` > 10.0" -> {ArrayList@67702} size = 0 > key = "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n FROM > `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON > (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n WHERE > `cmv_basetable_2_n3`.`c` > 10.0" > value = {ArrayList@67702} size = 0 >"SELECT `src_txn`.`key`, `src_txn`.`value` FROM `default`.`src_txn` where > `src_txn`.`key` > 200 and `src_txn`
[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715611 ] ASF GitHub Bot logged work on HIVE-25883: - Author: ASF GitHub Bot Created on: 26/Jan/22 11:40 Start Date: 26/Jan/22 11:40 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2971: URL: https://github.com/apache/hive/pull/2971#discussion_r792552279 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -434,8 +437,18 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa return success; } - private boolean hasDataBelowWatermark(FileSystem fs, Path path, long highWatermark) throws IOException { -FileStatus[] children = fs.listStatus(path); + private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, Path path, long highWatermark) + throws IOException { +Set acidPaths = new HashSet<>(); +for (ParsedDelta delta : acidDir.getCurrentDirectories()) { + acidPaths.add(delta.getPath()); +} +if (acidDir.getBaseDirectory() != null) { + acidPaths.add(acidDir.getBaseDirectory()); +} +FileStatus[] children = fs.listStatus(path, p -> { + return !acidPaths.contains(p); +}); for (FileStatus child : children) { if (isFileBelowWatermark(child, highWatermark)) { Review comment: 1. I believe that in case there are files in the dir they already should be in the `obsolete` list; I just wanted to be conservative in this method - but I think returning true there would be correct as well 2. the `highWatermark` is inclusive; but this method's name is isBelowWatermark - so it only looks for files which are below the watermark w.r.t to `delta_1_5` ; I think its not below `5` because it contains data from `writeId` 5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715611) Time Spent: 1h 20m (was: 1h 10m) > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715590 ] ASF GitHub Bot logged work on HIVE-25883: - Author: ASF GitHub Bot Created on: 26/Jan/22 11:00 Start Date: 26/Jan/22 11:00 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2971: URL: https://github.com/apache/hive/pull/2971#discussion_r792524567 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -434,8 +437,18 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa return success; } - private boolean hasDataBelowWatermark(FileSystem fs, Path path, long highWatermark) throws IOException { -FileStatus[] children = fs.listStatus(path); + private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, Path path, long highWatermark) + throws IOException { +Set acidPaths = new HashSet<>(); +for (ParsedDelta delta : acidDir.getCurrentDirectories()) { + acidPaths.add(delta.getPath()); +} +if (acidDir.getBaseDirectory() != null) { + acidPaths.add(acidDir.getBaseDirectory()); +} +FileStatus[] children = fs.listStatus(path, p -> { + return !acidPaths.contains(p); +}); for (FileStatus child : children) { if (isFileBelowWatermark(child, highWatermark)) { Review comment: Commenting on the contents of isFileBelowWatermark since I can't comment there... 1. `if (!child.isDirectory()) { return false; }` There could be original files in the table directory that should be deleted. 2. `return b.getWriteId() < highWatermark;` the highWatermark is inclusive, so if for some reason the table directory contains: delta_5_5 delta_1_5_v100 (minor compacted) -- this includes the data in delta_5_5. then isFileBelowWatermark would return false but it should return true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715590) Time Spent: 1h 10m (was: 1h) > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25707) SchemaTool may leave the metastore in-between upgrade steps
[ https://issues.apache.org/jira/browse/HIVE-25707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482378#comment-17482378 ] Zoltan Haindrich commented on HIVE-25707: - [~rahulp] yes; it could probably catch a lot of problematic cases I've wrote a test for it - but we run the sql-s using sqlline ; if I disable auto-commit - the file is executed without being committed in the end...unless the jdbc driver autocommit-s it... I leave a reference to my branch here - in case someone picks this up later https://github.com/kgyrtkirk/hive/tree/HIVE-25707-schematool-commit > SchemaTool may leave the metastore in-between upgrade steps > --- > > Key: HIVE-25707 > URL: https://issues.apache.org/jira/browse/HIVE-25707 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > > it seems like: > * schematool runs the sql files via beeline > * autocommit is turned on > * pressing ctrl+c or killing the process will result in an invalid schema > https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/beeline/src/java/org/apache/hive/beeline/schematool/HiveSchemaTool.java#L79 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted
[ https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482356#comment-17482356 ] Nisarg Nagrale commented on HIVE-24573: --- Check this first your table is in dc_usermanage. database otherwise while deleting the table mention database.tablename > hive 3.1.2 drop table Sometimes it can't be deleted > --- > > Key: HIVE-24573 > URL: https://issues.apache.org/jira/browse/HIVE-24573 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: paul >Assignee: Nisarg Nagrale >Priority: Blocker > > Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be > deleted; hive.log The log shows > 2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] > metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: > hive.dc_usermanage.trade_3_temp448 table not found > > Statement returns success > > I doubt that this problem will only arise under the condition of high-level > merger. We run a lot of tasks every day, one or two tasks every day, which > will happen > > metastore mysql > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted
[ https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nisarg Nagrale reassigned HIVE-24573: - Assignee: Nisarg Nagrale > hive 3.1.2 drop table Sometimes it can't be deleted > --- > > Key: HIVE-24573 > URL: https://issues.apache.org/jira/browse/HIVE-24573 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: paul >Assignee: Nisarg Nagrale >Priority: Blocker > > Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be > deleted; hive.log The log shows > 2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] > metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: > hive.dc_usermanage.trade_3_temp448 table not found > > Statement returns success > > I doubt that this problem will only arise under the condition of high-level > merger. We run a lot of tasks every day, one or two tasks every day, which > will happen > > metastore mysql > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases
[ https://issues.apache.org/jira/browse/HIVE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nisarg Nagrale reassigned HIVE-25891: - Assignee: Nisarg Nagrale (was: Marton Bod) > Improve Iceberg error message for unsupported vectorization cases > - > > Key: HIVE-25891 > URL: https://issues.apache.org/jira/browse/HIVE-25891 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Nisarg Nagrale >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, if you attempt to read a Parquet or Avro Iceberg table with > vectorization turned on, you will eventually get an error message since it's > not supported. However, this error message is very misleading and does not > explain clearly what the problem is and how to work around it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25777) ACID: Pick the compactor transaction over insert dir
[ https://issues.apache.org/jira/browse/HIVE-25777?focusedWorklogId=715543&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715543 ] ASF GitHub Bot logged work on HIVE-25777: - Author: ASF GitHub Bot Created on: 26/Jan/22 09:23 Start Date: 26/Jan/22 09:23 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2968: URL: https://github.com/apache/hive/pull/2968#discussion_r792445062 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -1812,7 +1812,12 @@ private static void processBaseDir(Path baseDir, ValidWriteIdList writeIdList, V directory.getAbortedWriteIds().add(parsedBase.writeId); return; } -if (directory.getBase() == null || directory.getBase().getWriteId() < writeId) { +if (directory.getBase() == null || directory.getBase().getWriteId() < writeId + // If there are two competing versions of a particular write-id, one from the compactor and another from IOW, + // always pick the compactor one once it is committed. + || directory.getBase().getWriteId() == writeId && parsedBase.getVisibilityTxnId() > 0 Review comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715543) Time Spent: 0.5h (was: 20m) > ACID: Pick the compactor transaction over insert dir > > > Key: HIVE-25777 > URL: https://issues.apache.org/jira/browse/HIVE-25777 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.2, 4.0.0 >Reporter: Gopal Vijayaraghavan >Priority: Major > Labels: Compaction, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > If there are two competing versions of a particular write-id, one from the > compactor and another from the original insert, always pick the compactor one > once it is committed. > If the directory structure looks like > {code} > base_11/ > base_11_v192/ > {code} > Then always pick the v192 transaction if txnid=192 is committed. > This is required to ensure that the raw base_ dir can be deleted safely on > non-atomic directory deletions (like s3), without a race condition between > getSplits and the actual file-reader. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25777) ACID: Pick the compactor transaction over insert dir
[ https://issues.apache.org/jira/browse/HIVE-25777?focusedWorklogId=715538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715538 ] ASF GitHub Bot logged work on HIVE-25777: - Author: ASF GitHub Bot Created on: 26/Jan/22 08:55 Start Date: 26/Jan/22 08:55 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2968: URL: https://github.com/apache/hive/pull/2968#discussion_r792422548 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -1812,7 +1812,12 @@ private static void processBaseDir(Path baseDir, ValidWriteIdList writeIdList, V directory.getAbortedWriteIds().add(parsedBase.writeId); return; } -if (directory.getBase() == null || directory.getBase().getWriteId() < writeId) { +if (directory.getBase() == null || directory.getBase().getWriteId() < writeId + // If there are two competing versions of a particular write-id, one from the compactor and another from IOW, + // always pick the compactor one once it is committed. + || directory.getBase().getWriteId() == writeId && parsedBase.getVisibilityTxnId() > 0 Review comment: Just kind of a nit: there's an `isCompactedBase` method you could use instead of `parsedBase.getVisibilityTxnId() > 0`. It doesn't do much more but it would make this more readable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 715538) Time Spent: 20m (was: 10m) > ACID: Pick the compactor transaction over insert dir > > > Key: HIVE-25777 > URL: https://issues.apache.org/jira/browse/HIVE-25777 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.2, 4.0.0 >Reporter: Gopal Vijayaraghavan >Priority: Major > Labels: Compaction, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > If there are two competing versions of a particular write-id, one from the > compactor and another from the original insert, always pick the compactor one > once it is committed. > If the directory structure looks like > {code} > base_11/ > base_11_v192/ > {code} > Then always pick the v192 transaction if txnid=192 is committed. > This is required to ensure that the raw base_ dir can be deleted safely on > non-atomic directory deletions (like s3), without a race condition between > getSplits and the actual file-reader. -- This message was sent by Atlassian Jira (v8.20.1#820001)