date:20220126

[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716215
 ]

ASF GitHub Bot logged work on HIVE-21100:
-

Author: ASF GitHub Bot
Created on: 27/Jan/22 06:43
Start Date: 27/Jan/22 06:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2921:
URL: https://github.com/apache/hive/pull/2921#discussion_r793293292



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -97,6 +101,40 @@ public MoveTask() {
 super();
   }
 
+  public void flattenUnionSubdirectories(Path sourcePath) throws HiveException 
{
+try {
+  FileSystem fs = sourcePath.getFileSystem(conf);
+  LOG.info("Checking " + sourcePath + " for subdirectories to flatten");
+  Set unionSubdirs = new HashSet<>();
+  if (fs.exists(sourcePath)) {
+RemoteIterator i = fs.listFiles(sourcePath, true);
+String prefix = AbstractFileMergeOperator.UNION_SUDBIR_PREFIX;
+while (i.hasNext()) {
+  Path path = i.next().getPath();
+  Path parent = path.getParent();
+  if (parent.getName().startsWith(prefix)) {
+// We do rename by including the name of parent directory into the 
filename so that there are no clashes
+// when we move the files to the parent directory. Ex. 
HIVE_UNION_SUBDIR_1/00_0 -> 1_00_0
+String parentOfParent = parent.getParent().toString();
+String parentNameSuffix = 
parent.getName().substring(prefix.length());
+
+fs.rename(path, new Path(parentOfParent + "/" + parentNameSuffix + 
"_" + path.getName()));

Review comment:
   What happens if we already has this filename used? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 716215)
Time Spent: 2h  (was: 1h 50m)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716213
 ]

ASF GitHub Bot logged work on HIVE-21100:
-

Author: ASF GitHub Bot
Created on: 27/Jan/22 06:42
Start Date: 27/Jan/22 06:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2921:
URL: https://github.com/apache/hive/pull/2921#discussion_r793292863



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -97,6 +101,40 @@ public MoveTask() {
 super();
   }
 
+  public void flattenUnionSubdirectories(Path sourcePath) throws HiveException 
{
+try {
+  FileSystem fs = sourcePath.getFileSystem(conf);
+  LOG.info("Checking " + sourcePath + " for subdirectories to flatten");
+  Set unionSubdirs = new HashSet<>();
+  if (fs.exists(sourcePath)) {
+RemoteIterator i = fs.listFiles(sourcePath, true);

Review comment:
   You have mentioned that for ACID does not need this. Could we avoid 
these calls when they are not needed? Otherwise we make every query slower




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 716213)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716209
 ]

ASF GitHub Bot logged work on HIVE-21100:
-

Author: ASF GitHub Bot
Created on: 27/Jan/22 06:38
Start Date: 27/Jan/22 06:38
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2921:
URL: https://github.com/apache/hive/pull/2921#discussion_r793291201



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -97,6 +101,40 @@ public MoveTask() {
 super();
   }
 
+  public void flattenUnionSubdirectories(Path sourcePath) throws HiveException 
{
+try {
+  FileSystem fs = sourcePath.getFileSystem(conf);
+  LOG.info("Checking " + sourcePath + " for subdirectories to flatten");
+  Set unionSubdirs = new HashSet<>();
+  if (fs.exists(sourcePath)) {

Review comment:
   This is a costly call. We have a same result with catching the relevant 
exception in `listFiles`, and we can save one FileSystem.exists call. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 716209)
Time Spent: 1h 40m  (was: 1.5h)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25903) Upgrade Joda time version

2022-01-26 Thread Venkatasubrahmanian Narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkatasubrahmanian Narayanan updated HIVE-25903:
-
Attachment: HIVE-25903.patch

> Upgrade Joda time version
> -
>
> Key: HIVE-25903
> URL: https://issues.apache.org/jira/browse/HIVE-25903
> Project: Hive
>  Issue Type: Improvement
>Reporter: Venkatasubrahmanian Narayanan
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Minor
> Attachments: HIVE-25903.patch
>
>
> Hive uses an older version of Joda time, which can cause issues with some 
> workflows. Switching over to the latest version resolves the issue from 
> Hive's end.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25903) Upgrade Joda time version

2022-01-26 Thread Venkatasubrahmanian Narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkatasubrahmanian Narayanan reassigned HIVE-25903:



> Upgrade Joda time version
> -
>
> Key: HIVE-25903
> URL: https://issues.apache.org/jira/browse/HIVE-25903
> Project: Hive
>  Issue Type: Improvement
>Reporter: Venkatasubrahmanian Narayanan
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Minor
>
> Hive uses an older version of Joda time, which can cause issues with some 
> workflows. Switching over to the latest version resolves the issue from 
> Hive's end.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25893) NPE when reading Parquet data because ColumnVector isNull[] is not updated

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25893?focusedWorklogId=715996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715996
 ]

ASF GitHub Bot logged work on HIVE-25893:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 20:48
Start Date: 26/Jan/22 20:48
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2970:
URL: https://github.com/apache/hive/pull/2970#discussion_r793035137



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -500,22 +501,37 @@ private boolean 
compareDecimalColumnVector(DecimalColumnVector cv1, DecimalColum
   private boolean compareBytesColumnVector(BytesColumnVector cv1, 
BytesColumnVector cv2) {
 int length1 = cv1.vector.length;
 int length2 = cv2.vector.length;
-if (length1 == length2) {
-  for (int i = 0; i < length1; i++) {
-int innerLen1 = cv1.vector[i].length;
-int innerLen2 = cv2.vector[i].length;
-if (innerLen1 == innerLen2) {
-  for (int j = 0; j < innerLen1; j++) {
-if (cv1.vector[i][j] != cv2.vector[i][j]) {
-  return false;
-}
-  }
-} else {
+if (length1 != length2) {
+  return false;
+}
+
+for (int i = 0; i < length1; i++) {
+  // check for different nulls
+  if (columnVectorsDifferNullForSameIndex(cv1, cv2, i)) {
+return false;
+  }
+
+  // if they are both null, continue
+  // else if one of them is null, return false
+  if (cv1.isNull[i] && cv2.isNull[i]) {
+continue;
+  } else if (cv1.isNull[i] || cv2.isNull[i]) {

Review comment:
   Thanks @kasakrisz! Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715996)
Time Spent: 40m  (was: 0.5h)

> NPE when reading Parquet data because ColumnVector isNull[] is not updated
> --
>
> Key: HIVE-25893
> URL: https://issues.apache.org/jira/browse/HIVE-25893
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In 
> [VectorizedListColumnReader.java|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java]
>  {{isNull[]}} is used in the comparison methods ( eg. 
> [columnVectorsDifferNullForSameIndex 
> |https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L524]
>  ), however, {{isNull}} is always {{false}} as it is never updated in 
> [getChildData|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L401].
>  This could result in NullPointerException like,
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareBytesColumnVector(VectorizedListColumnReader.java:506)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareColumnVector(VectorizedListColumnReader.java:432)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.setIsRepeating(VectorizedListColumnReader.java:367)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:360)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:83)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:438)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:377)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:100)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.

[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715831
 ]

ASF GitHub Bot logged work on HIVE-25883:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 17:18
Start Date: 26/Jan/22 17:18
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2971:
URL: https://github.com/apache/hive/pull/2971#discussion_r792867605



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -434,8 +437,18 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 return success;
   }
 
-  private boolean hasDataBelowWatermark(FileSystem fs, Path path, long 
highWatermark) throws IOException {
-FileStatus[] children = fs.listStatus(path);
+  private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, 
Path path, long highWatermark)
+  throws IOException {
+Set acidPaths = new HashSet<>();
+for (ParsedDelta delta : acidDir.getCurrentDirectories()) {
+  acidPaths.add(delta.getPath());
+}
+if (acidDir.getBaseDirectory() != null) {
+  acidPaths.add(acidDir.getBaseDirectory());
+}
+FileStatus[] children = fs.listStatus(path, p -> {
+  return !acidPaths.contains(p);
+});
 for (FileStatus child : children) {
   if (isFileBelowWatermark(child, highWatermark)) {

Review comment:
   after some thinking I convinced myself that you are right  :)
   
   * I've changed to return `true` for non-directories
   ** in the background these should appear as `obsolete` files anyway; so it 
should not cause any real trouble in the scope of HIVE-25883; but it makes the 
method live up to its name...
   * since the latest patch we are checking and excluding all the dirs the 
actual acid dir is using - so if we have anything below or even at the writeid 
level that should be considered invalid; the `nothingToCleanAfterAbortsDelta` 
testcase is a "complicated" case but the default case is similar to this with 
the new checks.
   
   pushed a new commit to update these




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715831)
Time Spent: 1h 40m  (was: 1.5h)

> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=715820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715820
 ]

ASF GitHub Bot logged work on HIVE-21100:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 16:53
Start Date: 26/Jan/22 16:53
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on pull request #2921:
URL: https://github.com/apache/hive/pull/2921#issuecomment-1022392409


   @pvary After some manual testing, I found this UNION_SUBDIR doesn't exist 
for ACID tables. It only exist for external table on Tez, so I added qtest to 
tez. Could you help to review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715820)
Time Spent: 1.5h  (was: 1h 20m)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25809) Implement URI Mapping for KuduStorageHandler in Hive

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25809?focusedWorklogId=715789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715789
 ]

ASF GitHub Bot logged work on HIVE-25809:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 16:19
Start Date: 26/Jan/22 16:19
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #2877:
URL: https://github.com/apache/hive/pull/2877


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715789)
Time Spent: 0.5h  (was: 20m)

> Implement URI Mapping for KuduStorageHandler in Hive 
> -
>
> Key: HIVE-25809
> URL: https://issues.apache.org/jira/browse/HIVE-25809
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Security
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, there is no storage URI mapping for KuduStorageHandler based on 
> the feature HIVE-24705. The API getURIForAuth() needs to be implemented in 
> KuduStorageHandler.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25897) Move delta metric collection into AcidMetricsService

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25897?focusedWorklogId=715737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715737
 ]

ASF GitHub Bot logged work on HIVE-25897:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:59
Start Date: 26/Jan/22 14:59
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2973:
URL: https://github.com/apache/hive/pull/2973#discussion_r792724777



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -1600,7 +1602,9 @@ public static ConfVars getMetaConf(String name) {
 // Deprecated Hive values that we are keeping for backwards compatibility.
 @Deprecated
 
HIVE_CODAHALE_METRICS_REPORTER_CLASSES("hive.service.metrics.codahale.reporter.classes",
-"hive.service.metrics.codahale.reporter.classes", "",
+"hive.service.metrics.codahale.reporter.classes",
+
"org.apache.hadoop.hive.common.metrics.metrics2.JsonFileMetricsReporter, " +
+"org.apache.hadoop.hive.common.metrics.metrics2.JmxMetricsReporter",

Review comment:
   This param is used to initialize the CodahaleMetricsReporter classes. In 
some unit test, we are using it, and since the unit test were moved from the 
`hive-common` module to the `standalone-metastore-common` module, I had to add 
these values to the `MetastoreConf` to keep backward compatibility. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715737)
Time Spent: 50m  (was: 40m)

> Move delta metric collection into AcidMetricsService
> 
>
> Key: HIVE-25897
> URL: https://issues.apache.org/jira/browse/HIVE-25897
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> DeltaFilesMetricReporter and AcidMetricsService are two different threads 
> collecting ACID related metrics. It makes sense to merge those threads since 
> they share the same goal. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25897) Move delta metric collection into AcidMetricsService

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25897?focusedWorklogId=715734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715734
 ]

ASF GitHub Bot logged work on HIVE-25897:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:55
Start Date: 26/Jan/22 14:55
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2973:
URL: https://github.com/apache/hive/pull/2973#discussion_r792720421



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -429,7 +422,8 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 .map(Path::getName).collect(Collectors.joining(",")));
 boolean success = remove(location, ci, obsoleteDirs, true, fs, 
extraDebugInfo);
 if (dir.getObsolete().size() > 0) {
-  updateDeltaFilesMetrics(ci.dbname, ci.tableName, ci.partName, 
dir.getObsolete());
+  AcidMetricService.updateMetricsFromCleaner(ci.dbname, ci.tableName, 
ci.partName, dir.getObsolete(), conf,

Review comment:
   You're right. Fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715734)
Time Spent: 40m  (was: 0.5h)

> Move delta metric collection into AcidMetricsService
> 
>
> Key: HIVE-25897
> URL: https://issues.apache.org/jira/browse/HIVE-25897
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DeltaFilesMetricReporter and AcidMetricsService are two different threads 
> collecting ACID related metrics. It makes sense to merge those threads since 
> they share the same goal. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25897) Move delta metric collection into AcidMetricsService

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25897?focusedWorklogId=715719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715719
 ]

ASF GitHub Bot logged work on HIVE-25897:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:35
Start Date: 26/Jan/22 14:35
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2973:
URL: https://github.com/apache/hive/pull/2973#discussion_r792691109



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -1600,7 +1602,9 @@ public static ConfVars getMetaConf(String name) {
 // Deprecated Hive values that we are keeping for backwards compatibility.
 @Deprecated
 
HIVE_CODAHALE_METRICS_REPORTER_CLASSES("hive.service.metrics.codahale.reporter.classes",
-"hive.service.metrics.codahale.reporter.classes", "",
+"hive.service.metrics.codahale.reporter.classes",
+
"org.apache.hadoop.hive.common.metrics.metrics2.JsonFileMetricsReporter, " +
+"org.apache.hadoop.hive.common.metrics.metrics2.JmxMetricsReporter",

Review comment:
   What does this do? Especially since the config is deprecated?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715719)
Time Spent: 0.5h  (was: 20m)

> Move delta metric collection into AcidMetricsService
> 
>
> Key: HIVE-25897
> URL: https://issues.apache.org/jira/browse/HIVE-25897
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> DeltaFilesMetricReporter and AcidMetricsService are two different threads 
> collecting ACID related metrics. It makes sense to merge those threads since 
> they share the same goal. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715716
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:21
Start Date: 26/Jan/22 14:21
Worklog Time Spent: 10m 
  Work Description: vcsomor commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792685276



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -155,18 +159,28 @@ public void run() {
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
 for (CompactionInfo compactionInfo : readyToClean) {
-  
cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked(
-  () -> clean(compactionInfo, cleanerWaterMark, 
metricsEnabled)), cleanerExecutor));
+  String tableName = compactionInfo.getFullTableName();
+  String partition = compactionInfo.getFullPartitionName();
+  CompletableFuture asyncJob =
+  CompletableFuture.runAsync(
+  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
+  cleanerExecutor)
+  .exceptionally(t -> {
+cleanerErrors.incrementAndGet();
+LOG.error("Error during the cleaning the table {} / 
partition {}", tableName, partition, t);
+return null;
+  });
+  cleanerList.add(asyncJob);
 }
 CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
+
+if (metricsEnabled && handle != null) {

Review comment:
   this is why I left there.. in the same time I've removed from the 
Initiator where it cannot be null
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715716)
Time Spent: 1h 40m  (was: 1.5h)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715714
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:20
Start Date: 26/Jan/22 14:20
Worklog Time Spent: 10m 
  Work Description: vcsomor commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792684511



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -155,18 +159,28 @@ public void run() {
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
 for (CompactionInfo compactionInfo : readyToClean) {
-  
cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked(
-  () -> clean(compactionInfo, cleanerWaterMark, 
metricsEnabled)), cleanerExecutor));
+  String tableName = compactionInfo.getFullTableName();
+  String partition = compactionInfo.getFullPartitionName();
+  CompletableFuture asyncJob =
+  CompletableFuture.runAsync(
+  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
+  cleanerExecutor)
+  .exceptionally(t -> {
+cleanerErrors.incrementAndGet();
+LOG.error("Error during the cleaning the table {} / 
partition {}", tableName, partition, t);
+return null;
+  });
+  cleanerList.add(asyncJob);
 }
 CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
+
+if (metricsEnabled && handle != null) {

Review comment:
   according to IntelliJ code-path analyzer it might happens to be




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715714)
Time Spent: 1.5h  (was: 1h 20m)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715711
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:19
Start Date: 26/Jan/22 14:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792682742



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -155,18 +159,28 @@ public void run() {
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
 for (CompactionInfo compactionInfo : readyToClean) {
-  
cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked(
-  () -> clean(compactionInfo, cleanerWaterMark, 
metricsEnabled)), cleanerExecutor));
+  String tableName = compactionInfo.getFullTableName();
+  String partition = compactionInfo.getFullPartitionName();
+  CompletableFuture asyncJob =
+  CompletableFuture.runAsync(
+  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
+  cleanerExecutor)
+  .exceptionally(t -> {
+cleanerErrors.incrementAndGet();
+LOG.error("Error during the cleaning the table {} / 
partition {}", tableName, partition, t);
+return null;
+  });
+  cleanerList.add(asyncJob);
 }
 CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
+
+if (metricsEnabled && handle != null) {
+  failuresCounter.inc(cleanerErrors.get());

Review comment:
   disregard




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715711)
Time Spent: 1h 20m  (was: 1h 10m)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
>

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715709
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:18
Start Date: 26/Jan/22 14:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792681772



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -155,18 +159,28 @@ public void run() {
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
 for (CompactionInfo compactionInfo : readyToClean) {
-  
cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked(
-  () -> clean(compactionInfo, cleanerWaterMark, 
metricsEnabled)), cleanerExecutor));
+  String tableName = compactionInfo.getFullTableName();
+  String partition = compactionInfo.getFullPartitionName();
+  CompletableFuture asyncJob =
+  CompletableFuture.runAsync(
+  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
+  cleanerExecutor)
+  .exceptionally(t -> {
+cleanerErrors.incrementAndGet();
+LOG.error("Error during the cleaning the table {} / 
partition {}", tableName, partition, t);
+return null;
+  });
+  cleanerList.add(asyncJob);
 }
 CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
+
+if (metricsEnabled && handle != null) {
+  failuresCounter.inc(cleanerErrors.get());

Review comment:
   i don't think this is correct as cleanerErrors is a global variable and 
would be incremented on every iteration 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715709)
Time Spent: 1h 10m  (was: 1h)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this f

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715708
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 14:15
Start Date: 26/Jan/22 14:15
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792679287



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -155,18 +159,28 @@ public void run() {
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
 for (CompactionInfo compactionInfo : readyToClean) {
-  
cleanerList.add(CompletableFuture.runAsync(ThrowingRunnable.unchecked(
-  () -> clean(compactionInfo, cleanerWaterMark, 
metricsEnabled)), cleanerExecutor));
+  String tableName = compactionInfo.getFullTableName();
+  String partition = compactionInfo.getFullPartitionName();
+  CompletableFuture asyncJob =
+  CompletableFuture.runAsync(
+  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
+  cleanerExecutor)
+  .exceptionally(t -> {
+cleanerErrors.incrementAndGet();
+LOG.error("Error during the cleaning the table {} / 
partition {}", tableName, partition, t);
+return null;
+  });
+  cleanerList.add(asyncJob);
 }
 CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
+
+if (metricsEnabled && handle != null) {

Review comment:
   can handle be null here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715708)
Time Spent: 1h  (was: 50m)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> ret

[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715670&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715670
 ]

ASF GitHub Bot logged work on HIVE-25883:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 13:22
Start Date: 26/Jan/22 13:22
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2971:
URL: https://github.com/apache/hive/pull/2971#discussion_r792628396



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -434,8 +437,18 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 return success;
   }
 
-  private boolean hasDataBelowWatermark(FileSystem fs, Path path, long 
highWatermark) throws IOException {
-FileStatus[] children = fs.listStatus(path);
+  private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, 
Path path, long highWatermark)
+  throws IOException {
+Set acidPaths = new HashSet<>();
+for (ParsedDelta delta : acidDir.getCurrentDirectories()) {
+  acidPaths.add(delta.getPath());
+}
+if (acidDir.getBaseDirectory() != null) {
+  acidPaths.add(acidDir.getBaseDirectory());
+}
+FileStatus[] children = fs.listStatus(path, p -> {
+  return !acidPaths.contains(p);
+});
 for (FileStatus child : children) {
   if (isFileBelowWatermark(child, highWatermark)) {

Review comment:
   1.
   > I believe that in case there are files in the dir they already should be 
in the obsolete list
   
   Not necessarily, because the AcidDirectory the Cleaner uses is computed 
based on an older txnId (cleanerWaterMark), so there is a chance its obsolete 
list does not contain files that should be cleaned up eventually, which is what 
this method is supposed to figure out. (Right?)
   
   @deniskuzZ please correct me if I'm wrong about this since I know there have 
been recent changes to this logic
   
   2. I meant that if the table dir contains:
   
   - delta_5_5
   - delta_1_5_v100 (minor compacted)
   
   Then the cleaner should eventually remove delta_5_5, so there will be files 
to remove later, when the cleanerWaterMark is high enough




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715670)
Time Spent: 1.5h  (was: 1h 20m)

> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted

2022-01-26 Thread Pravin Pawar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Pawar resolved HIVE-24573.
-
Resolution: Fixed

ok

> hive 3.1.2 drop table Sometimes it can't be deleted
> ---
>
> Key: HIVE-24573
> URL: https://issues.apache.org/jira/browse/HIVE-24573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: paul
>Assignee: Nisarg Nagrale
>Priority: Blocker
>
> Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be 
> deleted; hive.log  The log shows 
>   2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] 
> metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: 
> hive.dc_usermanage.trade_3_temp448 table not found
>  
> Statement returns success
>  
> I doubt that this problem will only arise under the condition of high-level 
> merger. We run a lot of tasks every day, one or two tasks every day, which 
> will happen
>  
> metastore  mysql
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25901) unable to run query

2022-01-26 Thread Pravin Pawar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482457#comment-17482457
 ] 

Pravin Pawar commented on HIVE-25901:
-

ok

> unable to run query
> ---
>
> Key: HIVE-25901
> URL: https://issues.apache.org/jira/browse/HIVE-25901
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Pawar
>Assignee: Pravin Pawar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work started] (HIVE-25901) unable to run query

2022-01-26 Thread Pravin Pawar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25901 started by Pravin Pawar.
---
> unable to run query
> ---
>
> Key: HIVE-25901
> URL: https://issues.apache.org/jira/browse/HIVE-25901
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Pawar
>Assignee: Pravin Pawar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25901) unable to run query

2022-01-26 Thread Pravin Pawar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Pawar resolved HIVE-25901.
-
Release Note: done
  Resolution: Fixed

> unable to run query
> ---
>
> Key: HIVE-25901
> URL: https://issues.apache.org/jira/browse/HIVE-25901
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Pawar
>Assignee: Pravin Pawar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25901) unable to run query

2022-01-26 Thread Pravin Pawar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Pawar reassigned HIVE-25901:
---

Assignee: Pravin Pawar

> unable to run query
> ---
>
> Key: HIVE-25901
> URL: https://issues.apache.org/jira/browse/HIVE-25901
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Pawar
>Assignee: Pravin Pawar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715644
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 12:26
Start Date: 26/Jan/22 12:26
Worklog Time Spent: 10m 
  Work Description: vcsomor commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792585640



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
##
@@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String 
metric, long startedAt) {
   long elapsed = System.currentTimeMillis() - startedAt;
   LOG.debug("Updating {} metric to {}", metric, elapsed);
   Metrics.getOrCreateGauge(metric)
-  .set((int)elapsed);
+  .set((int) elapsed);
   return elapsed;
 }
 return 0;
   }
+
+  @VisibleForTesting
+  protected static void waitAllAsyncTask(List> tasks) 
throws AsyncTaskCompletionException {
+List exceptions = new ArrayList<>();
+for (CompletableFuture task : tasks) {
+  try {
+task.join();

Review comment:
   Or simply an atomic integer would suffice




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715644)
Time Spent: 50m  (was: 40m)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> return reportJoin((r = result) == null ? waitingGet(false) : r);
> }
> {code}
> (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715641
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 12:25
Start Date: 26/Jan/22 12:25
Worklog Time Spent: 10m 
  Work Description: vcsomor commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792584766



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
##
@@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String 
metric, long startedAt) {
   long elapsed = System.currentTimeMillis() - startedAt;
   LOG.debug("Updating {} metric to {}", metric, elapsed);
   Metrics.getOrCreateGauge(metric)
-  .set((int)elapsed);
+  .set((int) elapsed);
   return elapsed;
 }
 return 0;
   }
+
+  @VisibleForTesting
+  protected static void waitAllAsyncTask(List> tasks) 
throws AsyncTaskCompletionException {
+List exceptions = new ArrayList<>();
+for (CompletableFuture task : tasks) {
+  try {
+task.join();

Review comment:
   Which thread safe list implementation do you prefer in this case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715641)
Time Spent: 40m  (was: 0.5h)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> return reportJoin((r = result) == null ? waitingGet(false) : r);
> }
> {code}
> (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715625
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 12:12
Start Date: 26/Jan/22 12:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792575492



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
##
@@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String 
metric, long startedAt) {
   long elapsed = System.currentTimeMillis() - startedAt;
   LOG.debug("Updating {} metric to {}", metric, elapsed);
   Metrics.getOrCreateGauge(metric)
-  .set((int)elapsed);
+  .set((int) elapsed);
   return elapsed;
 }
 return 0;
   }
+
+  @VisibleForTesting
+  protected static void waitAllAsyncTask(List> tasks) 
throws AsyncTaskCompletionException {
+List exceptions = new ArrayList<>();
+for (CompletableFuture task : tasks) {
+  try {
+task.join();

Review comment:
   isn't it similar to CompletableFuture.allOf(List 
cf).join() ? you could do exceptions++ when declaring CompletableFuture by 
adding 
   
   .exceptionally(exception -> {
   collectedExceptions.add(exception);
   return null;
   }
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715625)
Time Spent: 0.5h  (was: 20m)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> return reportJoin((r = result) == null ? waitingGet(false) : r);
> }
> {code}
> (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715623
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 12:11
Start Date: 26/Jan/22 12:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2974:
URL: https://github.com/apache/hive/pull/2974#discussion_r792575492



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
##
@@ -128,9 +132,40 @@ protected static long updateCycleDurationMetric(String 
metric, long startedAt) {
   long elapsed = System.currentTimeMillis() - startedAt;
   LOG.debug("Updating {} metric to {}", metric, elapsed);
   Metrics.getOrCreateGauge(metric)
-  .set((int)elapsed);
+  .set((int) elapsed);
   return elapsed;
 }
 return 0;
   }
+
+  @VisibleForTesting
+  protected static void waitAllAsyncTask(List> tasks) 
throws AsyncTaskCompletionException {
+List exceptions = new ArrayList<>();
+for (CompletableFuture task : tasks) {
+  try {
+task.join();

Review comment:
   isn't it the same as CompletableFuture.allOf(List 
cf).join() ? you could do exceptions++ when declaring CompletableFuture by 
adding 
   
   .exceptionally(exception -> {
   collectedExceptions.add(exception);
   return null;
   }
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715623)
Time Spent: 20m  (was: 10m)

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> return reportJoin((r = result) == null ? waitingGet(false) : r);
> }
> {code}
> (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25899) Materialized view registry does not clean dropped views

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25899:
--
Labels: pull-request-available  (was: )

> Materialized view registry does not clean dropped views
> ---
>
> Key: HIVE-25899
> URL: https://issues.apache.org/jira/browse/HIVE-25899
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CBO plans of materialized views which are enabled for query rewrite are 
> cached in HS2 (MaterializedViewsCache)
> Dropping a materialized views should remove the entry from the cache however 
> the entry  keys are not removed.
> Cache state after running a whole PTest split:
> {code}
> this = {HiveMaterializedViewsRegistry@20858} 
>  materializedViewsCache = {MaterializedViewsCache@20913} 
>   materializedViews = {ConcurrentHashMap@67654}  size = 3
>"default" -> {ConcurrentHashMap@28568}  size = 8
> key = "default"
> value = {ConcurrentHashMap@28568}  size = 8
>  "cluster_mv_2" -> {HiveRelOptMaterialization@67786} 
>  "cluster_mv_1" -> {HiveRelOptMaterialization@67788} 
>  "cluster_mv_4" -> {HiveRelOptMaterialization@67790} 
>  "cluster_mv_3" -> {HiveRelOptMaterialization@67792} 
>  "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} 
>  "distribute_mv_1" -> {HiveRelOptMaterialization@67796} 
>  "distribute_mv_3" -> {HiveRelOptMaterialization@67798} 
>  "distribute_mv_2" -> {HiveRelOptMaterialization@67800} 
>"db2" -> {ConcurrentHashMap@67772}  size = 2
> key = "db2"
> value = {ConcurrentHashMap@67772}  size = 2
>  "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} 
>  "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} 
>"count_distinct" -> {ConcurrentHashMap@67774}  size = 0
> key = "count_distinct"
> value = {ConcurrentHashMap@67774}  size = 0
>   sqlToMaterializedView = {ConcurrentHashMap@20915}  size = 36
>"SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  FROM 
> `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
> (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
> `cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
> `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694}  size = 0
> key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  
> FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
> (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
> `cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
> `cmv_basetable_2_n100`.`c`"
> value = {ArrayList@67694}  size = 0
>"select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
> `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
> `emps_parquet_n3`.`deptno`" -> {ArrayList@67696}  size = 0
> key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
> `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
> `emps_parquet_n3`.`deptno`"
> value = {ArrayList@67696}  size = 0
>"select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
> `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> 
> {ArrayList@67698}  size = 1
> key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
> `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3"
> value = {ArrayList@67698}  size = 1
>"SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as 
> `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, 
> `default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n  AND 
> `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" -> 
> {ArrayList@67700}  size = 1
> key = "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 
> 100 as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM 
> `default`.`src_txn`, `default`.`src_txn_2`\nWHERE `src_txn`.`key` = 
> `src_txn_2`.`key`\n  AND `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) 
> `cluster_mv_3`"
> value = {ArrayList@67700}  size = 1
>"SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n  FROM 
> `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON 
> (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n  WHERE 
> `cmv_basetable_2_n3`.`c` > 10.0" -> {ArrayList@67702}  size = 0
> key = "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n  FROM 
> `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON 
> (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n  WHERE 
> `cmv_basetable_2_n3`.`c` > 10.0"
> value = {ArrayLis

[jira] [Work logged] (HIVE-25899) Materialized view registry does not clean dropped views

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25899?focusedWorklogId=715620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715620
 ]

ASF GitHub Bot logged work on HIVE-25899:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 12:09
Start Date: 26/Jan/22 12:09
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2975:
URL: https://github.com/apache/hive/pull/2975


   ### What changes were proposed in this pull request?
   `MaterializedViewsCache` nested maps.
   ```
   somedb -> someview -> Materialization
   ```
   1. When removing entries from the inner map check whether that map is empty 
and remove it from the outer map.
   2. Add `isEmpty()` method to `HiveMaterializedViewsRegistry`
   
   ### Why are the changes needed?
   See description of jira.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestMaterializedViewsCache -pl ql
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715620)
Remaining Estimate: 0h
Time Spent: 10m

> Materialized view registry does not clean dropped views
> ---
>
> Key: HIVE-25899
> URL: https://issues.apache.org/jira/browse/HIVE-25899
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CBO plans of materialized views which are enabled for query rewrite are 
> cached in HS2 (MaterializedViewsCache)
> Dropping a materialized views should remove the entry from the cache however 
> the entry  keys are not removed.
> Cache state after running a whole PTest split:
> {code}
> this = {HiveMaterializedViewsRegistry@20858} 
>  materializedViewsCache = {MaterializedViewsCache@20913} 
>   materializedViews = {ConcurrentHashMap@67654}  size = 3
>"default" -> {ConcurrentHashMap@28568}  size = 8
> key = "default"
> value = {ConcurrentHashMap@28568}  size = 8
>  "cluster_mv_2" -> {HiveRelOptMaterialization@67786} 
>  "cluster_mv_1" -> {HiveRelOptMaterialization@67788} 
>  "cluster_mv_4" -> {HiveRelOptMaterialization@67790} 
>  "cluster_mv_3" -> {HiveRelOptMaterialization@67792} 
>  "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} 
>  "distribute_mv_1" -> {HiveRelOptMaterialization@67796} 
>  "distribute_mv_3" -> {HiveRelOptMaterialization@67798} 
>  "distribute_mv_2" -> {HiveRelOptMaterialization@67800} 
>"db2" -> {ConcurrentHashMap@67772}  size = 2
> key = "db2"
> value = {ConcurrentHashMap@67772}  size = 2
>  "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} 
>  "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} 
>"count_distinct" -> {ConcurrentHashMap@67774}  size = 0
> key = "count_distinct"
> value = {ConcurrentHashMap@67774}  size = 0
>   sqlToMaterializedView = {ConcurrentHashMap@20915}  size = 36
>"SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  FROM 
> `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
> (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
> `cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
> `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694}  size = 0
> key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  
> FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
> (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
> `cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
> `cmv_basetable_2_n100`.`c`"
> value = {ArrayList@67694}  size = 0
>"select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
> `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
> `emps_parquet_n3`.`deptno`" -> {ArrayList@67696}  size = 0
> key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
> `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
> `emps_parquet_n3`.`deptno`"
> value = {ArrayList@67696}  size = 0
>"select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
> `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> 
> {ArrayList@67698}  size = 1
> key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
> `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3"

[jira] [Assigned] (HIVE-25900) Materialized view registry does not clean non existing views at refresh

2022-01-26 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25900:
-


> Materialized view registry does not clean non existing views at refresh
> ---
>
> Key: HIVE-25900
> URL: https://issues.apache.org/jira/browse/HIVE-25900
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> CBO plans of materialized views which are enabled for query rewrite are 
> cached in HS2 (MaterializedViewsCache, HiveMaterializedViewsRegistry)
> The registry is refreshed periodically from HMS:
> {code:java}
> set hive.server2.materializedviews.registry.refresh.period=1500s;
> {code}
> This functionality is required when multiple HS2 instances are used in a 
> cluster: MV drop operation is served by one of the HS2 instances and the 
> registry is updated at that time in that instance. However other HS2 
> instances still cache the non-existent view and need to be refreshed by the 
> updater thread.
> Currently the updater thread adds new entries, refresh existing ones but does 
> not remove the outdated entries.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25746:
--
Labels: pull-request-available  (was: )

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> return reportJoin((r = result) == null ? waitingGet(false) : r);
> }
> {code}
> (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25746) Compaction Failure Counter counted incorrectly

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25746?focusedWorklogId=715615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715615
 ]

ASF GitHub Bot logged work on HIVE-25746:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 11:51
Start Date: 26/Jan/22 11:51
Worklog Time Spent: 10m 
  Work Description: vcsomor opened a new pull request #2974:
URL: https://github.com/apache/hive/pull/2974


   Fixing compaction_initiator_failure/compaction_cleaner_failure_counter logic 
in the Initiator and Cleaner.
   After implementing this fix all the possible failures will be counted not 
just the first one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715615)
Remaining Estimate: 0h
Time Spent: 10m

> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Docs:
> {code}
> /**
>  * Returns the result value when complete, or throws an
>  * (unchecked) exception if completed exceptionally. To better
>  * conform with the use of common functional forms, if a
>  * computation involved in the completion of this
>  * CompletableFuture threw an exception, this method throws an
>  * (unchecked) {@link CompletionException} with the underlying
>  * exception as its cause.
>  *
>  * @return the result value
>  * @throws CancellationException if the computation was cancelled
>  * @throws CompletionException if this future completed
>  * exceptionally or a completion computation threw an exception
>  */
> public T join() {
> Object r;
> return reportJoin((r = result) == null ? waitingGet(false) : r);
> }
> {code}
> (!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25899) Materialized view registry does not clean dropped views

2022-01-26 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25899:
-


> Materialized view registry does not clean dropped views
> ---
>
> Key: HIVE-25899
> URL: https://issues.apache.org/jira/browse/HIVE-25899
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> CBO plans of materialized views which are enabled for query rewrite are 
> cached in HS2 (MaterializedViewsCache)
> Dropping a materialized views should remove the entry from the cache however 
> the entry  keys are not removed.
> Cache state after running a whole PTest split:
> {code}
> this = {HiveMaterializedViewsRegistry@20858} 
>  materializedViewsCache = {MaterializedViewsCache@20913} 
>   materializedViews = {ConcurrentHashMap@67654}  size = 3
>"default" -> {ConcurrentHashMap@28568}  size = 8
> key = "default"
> value = {ConcurrentHashMap@28568}  size = 8
>  "cluster_mv_2" -> {HiveRelOptMaterialization@67786} 
>  "cluster_mv_1" -> {HiveRelOptMaterialization@67788} 
>  "cluster_mv_4" -> {HiveRelOptMaterialization@67790} 
>  "cluster_mv_3" -> {HiveRelOptMaterialization@67792} 
>  "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} 
>  "distribute_mv_1" -> {HiveRelOptMaterialization@67796} 
>  "distribute_mv_3" -> {HiveRelOptMaterialization@67798} 
>  "distribute_mv_2" -> {HiveRelOptMaterialization@67800} 
>"db2" -> {ConcurrentHashMap@67772}  size = 2
> key = "db2"
> value = {ConcurrentHashMap@67772}  size = 2
>  "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} 
>  "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} 
>"count_distinct" -> {ConcurrentHashMap@67774}  size = 0
> key = "count_distinct"
> value = {ConcurrentHashMap@67774}  size = 0
>   sqlToMaterializedView = {ConcurrentHashMap@20915}  size = 36
>"SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  FROM 
> `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
> (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
> `cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
> `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694}  size = 0
> key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  
> FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
> (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
> `cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
> `cmv_basetable_2_n100`.`c`"
> value = {ArrayList@67694}  size = 0
>"select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
> `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
> `emps_parquet_n3`.`deptno`" -> {ArrayList@67696}  size = 0
> key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
> `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
> `emps_parquet_n3`.`deptno`"
> value = {ArrayList@67696}  size = 0
>"select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
> `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> 
> {ArrayList@67698}  size = 1
> key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
> `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3"
> value = {ArrayList@67698}  size = 1
>"SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as 
> `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, 
> `default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n  AND 
> `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" -> 
> {ArrayList@67700}  size = 1
> key = "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 
> 100 as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM 
> `default`.`src_txn`, `default`.`src_txn_2`\nWHERE `src_txn`.`key` = 
> `src_txn_2`.`key`\n  AND `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) 
> `cluster_mv_3`"
> value = {ArrayList@67700}  size = 1
>"SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n  FROM 
> `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON 
> (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n  WHERE 
> `cmv_basetable_2_n3`.`c` > 10.0" -> {ArrayList@67702}  size = 0
> key = "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n  FROM 
> `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON 
> (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n  WHERE 
> `cmv_basetable_2_n3`.`c` > 10.0"
> value = {ArrayList@67702}  size = 0
>"SELECT `src_txn`.`key`, `src_txn`.`value` FROM `default`.`src_txn` where 
> `src_txn`.`key` > 200 and `src_txn`

[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715611
 ]

ASF GitHub Bot logged work on HIVE-25883:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 11:40
Start Date: 26/Jan/22 11:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2971:
URL: https://github.com/apache/hive/pull/2971#discussion_r792552279



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -434,8 +437,18 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 return success;
   }
 
-  private boolean hasDataBelowWatermark(FileSystem fs, Path path, long 
highWatermark) throws IOException {
-FileStatus[] children = fs.listStatus(path);
+  private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, 
Path path, long highWatermark)
+  throws IOException {
+Set acidPaths = new HashSet<>();
+for (ParsedDelta delta : acidDir.getCurrentDirectories()) {
+  acidPaths.add(delta.getPath());
+}
+if (acidDir.getBaseDirectory() != null) {
+  acidPaths.add(acidDir.getBaseDirectory());
+}
+FileStatus[] children = fs.listStatus(path, p -> {
+  return !acidPaths.contains(p);
+});
 for (FileStatus child : children) {
   if (isFileBelowWatermark(child, highWatermark)) {

Review comment:
   1. I believe that in case there are files in the dir they already should 
be in the `obsolete` list; I just wanted to be conservative in this method - 
but I think returning true there would be correct as well
   2. the `highWatermark` is inclusive; but this method's name is 
isBelowWatermark - so it only looks for files which are below the watermark
 w.r.t to `delta_1_5` ; I think its not below `5` because it contains data 
from `writeId` 5.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715611)
Time Spent: 1h 20m  (was: 1h 10m)

> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25883?focusedWorklogId=715590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715590
 ]

ASF GitHub Bot logged work on HIVE-25883:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 11:00
Start Date: 26/Jan/22 11:00
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2971:
URL: https://github.com/apache/hive/pull/2971#discussion_r792524567



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -434,8 +437,18 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 return success;
   }
 
-  private boolean hasDataBelowWatermark(FileSystem fs, Path path, long 
highWatermark) throws IOException {
-FileStatus[] children = fs.listStatus(path);
+  private boolean hasDataBelowWatermark(AcidDirectory acidDir, FileSystem fs, 
Path path, long highWatermark)
+  throws IOException {
+Set acidPaths = new HashSet<>();
+for (ParsedDelta delta : acidDir.getCurrentDirectories()) {
+  acidPaths.add(delta.getPath());
+}
+if (acidDir.getBaseDirectory() != null) {
+  acidPaths.add(acidDir.getBaseDirectory());
+}
+FileStatus[] children = fs.listStatus(path, p -> {
+  return !acidPaths.contains(p);
+});
 for (FileStatus child : children) {
   if (isFileBelowWatermark(child, highWatermark)) {

Review comment:
   Commenting on the contents of isFileBelowWatermark since I can't comment 
there...
   
   1. `if (!child.isDirectory()) {
 return false;
   }`
   There could be original files in the table directory that should be deleted.
   
   2. `return b.getWriteId() < highWatermark;`
   the highWatermark is inclusive, so if for some reason the table directory 
contains:
   delta_5_5
   delta_1_5_v100 (minor compacted) -- this includes the data in delta_5_5.
   then isFileBelowWatermark would return false but it should return true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715590)
Time Spent: 1h 10m  (was: 1h)

> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25707) SchemaTool may leave the metastore in-between upgrade steps

2022-01-26 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482378#comment-17482378
 ] 

Zoltan Haindrich commented on HIVE-25707:
-

[~rahulp] yes; it could probably catch a lot of problematic cases

I've wrote a test for it - but we run the sql-s using sqlline ; if I disable 
auto-commit - the file is executed without being committed in the end...unless 
the jdbc driver autocommit-s it...

I leave a reference to my branch here - in case someone picks this up later
https://github.com/kgyrtkirk/hive/tree/HIVE-25707-schematool-commit

> SchemaTool may leave the metastore in-between upgrade steps
> ---
>
> Key: HIVE-25707
> URL: https://issues.apache.org/jira/browse/HIVE-25707
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> it seems like:
> * schematool runs the sql files via beeline
> * autocommit is turned on
> * pressing ctrl+c or killing the process will result in an invalid schema
> https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/beeline/src/java/org/apache/hive/beeline/schematool/HiveSchemaTool.java#L79



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted

2022-01-26 Thread Nisarg Nagrale (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482356#comment-17482356
 ] 

Nisarg Nagrale commented on HIVE-24573:
---

Check this first your table is in dc_usermanage. database otherwise while 
deleting the table mention database.tablename

> hive 3.1.2 drop table Sometimes it can't be deleted
> ---
>
> Key: HIVE-24573
> URL: https://issues.apache.org/jira/browse/HIVE-24573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: paul
>Assignee: Nisarg Nagrale
>Priority: Blocker
>
> Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be 
> deleted; hive.log  The log shows 
>   2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] 
> metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: 
> hive.dc_usermanage.trade_3_temp448 table not found
>  
> Statement returns success
>  
> I doubt that this problem will only arise under the condition of high-level 
> merger. We run a lot of tasks every day, one or two tasks every day, which 
> will happen
>  
> metastore  mysql
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted

2022-01-26 Thread Nisarg Nagrale (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nisarg Nagrale reassigned HIVE-24573:
-

Assignee: Nisarg Nagrale

> hive 3.1.2 drop table Sometimes it can't be deleted
> ---
>
> Key: HIVE-24573
> URL: https://issues.apache.org/jira/browse/HIVE-24573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: paul
>Assignee: Nisarg Nagrale
>Priority: Blocker
>
> Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be 
> deleted; hive.log  The log shows 
>   2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] 
> metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: 
> hive.dc_usermanage.trade_3_temp448 table not found
>  
> Statement returns success
>  
> I doubt that this problem will only arise under the condition of high-level 
> merger. We run a lot of tasks every day, one or two tasks every day, which 
> will happen
>  
> metastore  mysql
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-26 Thread Nisarg Nagrale (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nisarg Nagrale reassigned HIVE-25891:
-

Assignee: Nisarg Nagrale  (was: Marton Bod)

> Improve Iceberg error message for unsupported vectorization cases
> -
>
> Key: HIVE-25891
> URL: https://issues.apache.org/jira/browse/HIVE-25891
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Nisarg Nagrale
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, if you attempt to read a Parquet or Avro Iceberg table with 
> vectorization turned on, you will eventually get an error message since it's 
> not supported. However, this error message is very misleading and does not 
> explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25777) ACID: Pick the compactor transaction over insert dir

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25777?focusedWorklogId=715543&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715543
 ]

ASF GitHub Bot logged work on HIVE-25777:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 09:23
Start Date: 26/Jan/22 09:23
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2968:
URL: https://github.com/apache/hive/pull/2968#discussion_r792445062



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1812,7 +1812,12 @@ private static void processBaseDir(Path baseDir, 
ValidWriteIdList writeIdList, V
   directory.getAbortedWriteIds().add(parsedBase.writeId);
   return;
 }
-if (directory.getBase() == null || directory.getBase().getWriteId() < 
writeId) {
+if (directory.getBase() == null || directory.getBase().getWriteId() < 
writeId
+  // If there are two competing versions of a particular write-id, one 
from the compactor and another from IOW, 
+  // always pick the compactor one once it is committed.
+  || directory.getBase().getWriteId() == writeId && 
parsedBase.getVisibilityTxnId() > 0 

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715543)
Time Spent: 0.5h  (was: 20m)

> ACID: Pick the compactor transaction over insert dir
> 
>
> Key: HIVE-25777
> URL: https://issues.apache.org/jira/browse/HIVE-25777
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Gopal Vijayaraghavan
>Priority: Major
>  Labels: Compaction, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If there are two competing versions of a particular write-id, one from the 
> compactor and another from the original insert, always pick the compactor one 
> once it is committed.
> If the directory structure looks like 
> {code}
> base_11/
> base_11_v192/
> {code}
> Then always pick the v192 transaction if txnid=192 is committed.
> This is required to ensure that the raw base_ dir can be deleted safely on 
> non-atomic directory deletions (like s3), without a race condition between 
> getSplits and the actual file-reader.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25777) ACID: Pick the compactor transaction over insert dir

2022-01-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25777?focusedWorklogId=715538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715538
 ]

ASF GitHub Bot logged work on HIVE-25777:
-

Author: ASF GitHub Bot
Created on: 26/Jan/22 08:55
Start Date: 26/Jan/22 08:55
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2968:
URL: https://github.com/apache/hive/pull/2968#discussion_r792422548



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1812,7 +1812,12 @@ private static void processBaseDir(Path baseDir, 
ValidWriteIdList writeIdList, V
   directory.getAbortedWriteIds().add(parsedBase.writeId);
   return;
 }
-if (directory.getBase() == null || directory.getBase().getWriteId() < 
writeId) {
+if (directory.getBase() == null || directory.getBase().getWriteId() < 
writeId
+  // If there are two competing versions of a particular write-id, one 
from the compactor and another from IOW, 
+  // always pick the compactor one once it is committed.
+  || directory.getBase().getWriteId() == writeId && 
parsedBase.getVisibilityTxnId() > 0 

Review comment:
   Just kind of a nit: there's an `isCompactedBase` method you could use 
instead of `parsedBase.getVisibilityTxnId() > 0`. It doesn't do much more but 
it would make this more readable




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715538)
Time Spent: 20m  (was: 10m)

> ACID: Pick the compactor transaction over insert dir
> 
>
> Key: HIVE-25777
> URL: https://issues.apache.org/jira/browse/HIVE-25777
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Gopal Vijayaraghavan
>Priority: Major
>  Labels: Compaction, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If there are two competing versions of a particular write-id, one from the 
> compactor and another from the original insert, always pick the compactor one 
> once it is committed.
> If the directory structure looks like 
> {code}
> base_11/
> base_11_v192/
> {code}
> Then always pick the v192 transaction if txnid=192 is committed.
> This is required to ensure that the raw base_ dir can be deleted safely on 
> non-atomic directory deletions (like s3), without a race condition between 
> getSplits and the actual file-reader.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

41 matches

Mail list logo