[jira] [Work logged] (HIVE-25117) Vector PTF ClassCastException with Decimal64

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25117?focusedWorklogId=598437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598437
 ]

ASF GitHub Bot logged work on HIVE-25117:
-

Author: ASF GitHub Bot
Created on: 18/May/21 05:56
Start Date: 18/May/21 05:56
Worklog Time Spent: 10m 
  Work Description: ramesh0201 opened a new pull request #2286:
URL: https://github.com/apache/hive/pull/2286


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598437)
Remaining Estimate: 0h
Time Spent: 10m

> Vector PTF ClassCastException with Decimal64
> 
>
> Key: HIVE-25117
> URL: https://issues.apache.org/jira/browse/HIVE-25117
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: vector_ptf_classcast_exception.q
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Only reproduces when there is at least 1 buffered batch, so needed 2 rows 
> with 1 row/batch:
> {code:java}
> set hive.vectorized.testing.reducer.batch.size=1;
> {code}
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.copyNonSelectedColumnVector(VectorizedBatchUtil.java:664)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.forwardBufferedBatches(VectorPTFGroupBatches.java:228)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.fillGroupResultsAndForward(VectorPTFGroupBatches.java:318)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25117) Vector PTF ClassCastException with Decimal64

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25117:
--
Labels: pull-request-available  (was: )

> Vector PTF ClassCastException with Decimal64
> 
>
> Key: HIVE-25117
> URL: https://issues.apache.org/jira/browse/HIVE-25117
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: vector_ptf_classcast_exception.q
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Only reproduces when there is at least 1 buffered batch, so needed 2 rows 
> with 1 row/batch:
> {code:java}
> set hive.vectorized.testing.reducer.batch.size=1;
> {code}
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.copyNonSelectedColumnVector(VectorizedBatchUtil.java:664)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.forwardBufferedBatches(VectorPTFGroupBatches.java:228)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.fillGroupResultsAndForward(VectorPTFGroupBatches.java:318)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25120) VectorizedParquetRecordReader can't handle encrypted parquet files

2021-05-17 Thread George Song (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Song updated HIVE-25120:
---
Summary: VectorizedParquetRecordReader can't handle encrypted parquet files 
 (was: VectorizedParquetRecordReader can't to read parquet file with encrypted 
footer)

> VectorizedParquetRecordReader can't handle encrypted parquet files
> --
>
> Key: HIVE-25120
> URL: https://issues.apache.org/jira/browse/HIVE-25120
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Affects Versions: 3.1.2
>Reporter: George Song
>Priority: Major
>
> In parquet 1.12.0 the modular encryption feature is introduced. 
> https://issues.apache.org/jira/browse/PARQUET-1178 
> VectorizedParquetRecordReader can't read parquet files with encrypted footer. 
> It throws the following exceptions. 
> {code:java}
> Error: java.io.IOException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>   ... 11 more
> Caused by: java.lang.RuntimeException: 
> org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file 
> with encrypted footer. No keys available
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:156)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:99)
>   ... 16 more
> Caused by: org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to 
> read file with encrypted footer. No keys available
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:588)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readFooterFromFile(VectorizedParquetRecordReader.java:345)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:310)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:222)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:151)
>   ... 19 more
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=598434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598434
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 18/May/21 05:21
Start Date: 18/May/21 05:21
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r634047597



##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1197,6 +1241,112 @@ public boolean runDistCp(List srcPaths, Path dst, 
Configuration conf) thro
 }
   }
 
+  @Override
+  public boolean runDistCpWithSnapshots(String oldSnapshot, String 
newSnapshot, List srcPaths, Path dst, Configuration conf)
+  throws IOException {
+DistCpOptions options =
+new DistCpOptions.Builder(srcPaths, 
dst).withSyncFolder(true).withUseDiff(oldSnapshot, newSnapshot)
+
.preserve(FileAttribute.BLOCKSIZE).preserve(FileAttribute.XATTR).build();
+
+List params = constructDistCpWithSnapshotParams(srcPaths, dst, 
oldSnapshot, newSnapshot, conf, "-diff");
+try {
+  conf.setBoolean("mapred.mapper.new-api", true);
+  DistCp distcp = new DistCp(conf, options);
+  int returnCode = distcp.run(params.toArray(new String[0]));
+  if (returnCode == 0) {
+return true;
+  } else if (returnCode == DistCpConstants.INVALID_ARGUMENT) {
+// Handling FileNotFoundException, if source got deleted, in that case 
we don't want to copy either, So it is
+// like a success case, we didn't had anything to copy and we copied 
nothing, so, we need not to fail.
+LOG.warn("Copy failed with INVALID_ARGUMENT for source: {} to target: 
{} snapshot1: {} snapshot2: {} "
++ "params: {}", srcPaths, dst, oldSnapshot, newSnapshot, params);
+return true;
+  } else if (returnCode == DistCpConstants.UNKNOWN_ERROR && conf
+  .getBoolean("hive.repl.externaltable.snapshot.overwrite.target", 
true)) {
+// Check if this error is due to target modified.
+if (shouldRdiff(dst, conf, oldSnapshot)) {
+  LOG.warn("Copy failed due to target modified. Attempting to restore 
back the target. source: {} target: {} "
+  + "snapshot: {}", srcPaths, dst, oldSnapshot);
+  List rParams = constructDistCpWithSnapshotParams(srcPaths, 
dst, ".", oldSnapshot, conf, "-rdiff");
+  DistCp rDistcp = new DistCp(conf, options);
+  returnCode = rDistcp.run(rParams.toArray(new String[0]));
+  if (returnCode == 0) {
+LOG.info("Target restored to previous state.  source: {} target: 
{} snapshot: {}. Reattempting to copy.",
+srcPaths, dst, oldSnapshot);
+dst.getFileSystem(conf).deleteSnapshot(dst, oldSnapshot);
+dst.getFileSystem(conf).createSnapshot(dst, oldSnapshot);
+returnCode = distcp.run(params.toArray(new String[0]));
+if (returnCode == 0) {
+  return true;
+} else {
+  LOG.error("Copy failed with after target restore for source: {} 
to target: {} snapshot1: {} snapshot2: "
+  + "{} params: {}. Return code: {}", srcPaths, dst, 
oldSnapshot, newSnapshot, params, returnCode);
+  return false;
+}
+  }
+}
+  }
+} catch (Exception e) {
+  throw new IOException("Cannot execute DistCp process: ", e);
+} finally {
+  conf.setBoolean("mapred.mapper.new-api", false);
+}
+return false;
+  }
+
+  /**
+   * Checks wether reverse diff on the snapshot should be performed or not.
+   * @param p path where snapshot exists.
+   * @param conf the hive configuration.
+   * @param snapshot the name of snapshot.
+   * @return true, if we need to do rdiff.
+   */
+  private static boolean shouldRdiff(Path p, Configuration conf, String 
snapshot) throws Exception {
+// Using the configuration in string form since hive-shims doesn't have a 
dependency on hive-common.
+boolean isOverwrite = 
conf.getBoolean("hive.repl.externaltable.snapshot.overwrite.target", true);

Review comment:
   Done, Getting the value propagated from `DirCopyTask` itself




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598434)
Time Spent: 6h 50m  (was: 6h 40m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
>  

[jira] [Work logged] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25086?focusedWorklogId=598431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598431
 ]

ASF GitHub Bot logged work on HIVE-25086:
-

Author: ASF GitHub Bot
Created on: 18/May/21 05:15
Start Date: 18/May/21 05:15
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2240:
URL: https://github.com/apache/hive/pull/2240#discussion_r634044319



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/RangerDenyTask.java
##
@@ -131,6 +141,23 @@ public int execute() {
 }
 }
 
+private boolean isSetRangerDenyPolicyForReplicatedDb(String 
rangerEndpoint, String rangerHiveServiceName,

Review comment:
   One way can be save this information in dump dir so that next iteration 
can know ranger deny policy was already created in previous iteration.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598431)
Time Spent: 1h 40m  (was: 1.5h)

> Create Ranger Deny Policy for replication db in all cases if 
> hive.repl.ranger.target.deny.policy is set to true.
> 
>
> Key: HIVE-25086
> URL: https://issues.apache.org/jira/browse/HIVE-25086
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25086?focusedWorklogId=598426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598426
 ]

ASF GitHub Bot logged work on HIVE-25086:
-

Author: ASF GitHub Bot
Created on: 18/May/21 05:08
Start Date: 18/May/21 05:08
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2240:
URL: https://github.com/apache/hive/pull/2240#discussion_r634041895



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/RangerDenyTask.java
##
@@ -131,6 +141,23 @@ public int execute() {
 }
 }
 
+private boolean isSetRangerDenyPolicyForReplicatedDb(String 
rangerEndpoint, String rangerHiveServiceName,

Review comment:
   this can be expensive. Is there any other alternative to exporting all 
the ranger policies




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598426)
Time Spent: 1.5h  (was: 1h 20m)

> Create Ranger Deny Policy for replication db in all cases if 
> hive.repl.ranger.target.deny.policy is set to true.
> 
>
> Key: HIVE-25086
> URL: https://issues.apache.org/jira/browse/HIVE-25086
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25098) [CVE-2020-13949] Upgrade thrift from 0.13.0 to 0.14.1

2021-05-17 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25098:
-
Summary: [CVE-2020-13949] Upgrade thrift from 0.13.0 to 0.14.1  (was: 
[CVE-2020-13949] Upgrade thrift from 0.13.0 to 0.14.0)

> [CVE-2020-13949] Upgrade thrift from 0.13.0 to 0.14.1
> -
>
> Key: HIVE-25098
> URL: https://issues.apache.org/jira/browse/HIVE-25098
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Upgrading thrift from 0.13.0 to 0.14.0 due to 
> https://nvd.nist.gov/vuln/detail/CVE-2020-13949



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=598400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598400
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 18/May/21 02:42
Start Date: 18/May/21 02:42
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633997089



##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1197,6 +1241,112 @@ public boolean runDistCp(List srcPaths, Path dst, 
Configuration conf) thro
 }
   }
 
+  @Override
+  public boolean runDistCpWithSnapshots(String oldSnapshot, String 
newSnapshot, List srcPaths, Path dst, Configuration conf)
+  throws IOException {
+DistCpOptions options =
+new DistCpOptions.Builder(srcPaths, 
dst).withSyncFolder(true).withUseDiff(oldSnapshot, newSnapshot)
+
.preserve(FileAttribute.BLOCKSIZE).preserve(FileAttribute.XATTR).build();
+
+List params = constructDistCpWithSnapshotParams(srcPaths, dst, 
oldSnapshot, newSnapshot, conf, "-diff");
+try {
+  conf.setBoolean("mapred.mapper.new-api", true);
+  DistCp distcp = new DistCp(conf, options);
+  int returnCode = distcp.run(params.toArray(new String[0]));
+  if (returnCode == 0) {
+return true;
+  } else if (returnCode == DistCpConstants.INVALID_ARGUMENT) {
+// Handling FileNotFoundException, if source got deleted, in that case 
we don't want to copy either, So it is
+// like a success case, we didn't had anything to copy and we copied 
nothing, so, we need not to fail.
+LOG.warn("Copy failed with INVALID_ARGUMENT for source: {} to target: 
{} snapshot1: {} snapshot2: {} "
++ "params: {}", srcPaths, dst, oldSnapshot, newSnapshot, params);
+return true;
+  } else if (returnCode == DistCpConstants.UNKNOWN_ERROR && conf
+  .getBoolean("hive.repl.externaltable.snapshot.overwrite.target", 
true)) {
+// Check if this error is due to target modified.
+if (shouldRdiff(dst, conf, oldSnapshot)) {
+  LOG.warn("Copy failed due to target modified. Attempting to restore 
back the target. source: {} target: {} "
+  + "snapshot: {}", srcPaths, dst, oldSnapshot);
+  List rParams = constructDistCpWithSnapshotParams(srcPaths, 
dst, ".", oldSnapshot, conf, "-rdiff");
+  DistCp rDistcp = new DistCp(conf, options);
+  returnCode = rDistcp.run(rParams.toArray(new String[0]));
+  if (returnCode == 0) {
+LOG.info("Target restored to previous state.  source: {} target: 
{} snapshot: {}. Reattempting to copy.",
+srcPaths, dst, oldSnapshot);
+dst.getFileSystem(conf).deleteSnapshot(dst, oldSnapshot);
+dst.getFileSystem(conf).createSnapshot(dst, oldSnapshot);
+returnCode = distcp.run(params.toArray(new String[0]));
+if (returnCode == 0) {
+  return true;
+} else {
+  LOG.error("Copy failed with after target restore for source: {} 
to target: {} snapshot1: {} snapshot2: "
+  + "{} params: {}. Return code: {}", srcPaths, dst, 
oldSnapshot, newSnapshot, params, returnCode);
+  return false;
+}
+  }
+}
+  }
+} catch (Exception e) {
+  throw new IOException("Cannot execute DistCp process: ", e);
+} finally {
+  conf.setBoolean("mapred.mapper.new-api", false);
+}
+return false;
+  }
+
+  /**
+   * Checks wether reverse diff on the snapshot should be performed or not.
+   * @param p path where snapshot exists.
+   * @param conf the hive configuration.
+   * @param snapshot the name of snapshot.
+   * @return true, if we need to do rdiff.
+   */
+  private static boolean shouldRdiff(Path p, Configuration conf, String 
snapshot) throws Exception {
+// Using the configuration in string form since hive-shims doesn't have a 
dependency on hive-common.
+boolean isOverwrite = 
conf.getBoolean("hive.repl.externaltable.snapshot.overwrite.target", true);

Review comment:
   No I meant can you not pass the value of the config to HadoopShims 
method of run distcp from whoever is calling this method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598400)
Time Spent: 6h 40m  (was: 6.5h)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
>   

[jira] [Updated] (HIVE-25069) Hive Distributed Tracing

2021-05-17 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25069:

Status: Open  (was: Patch Available)

I'll try a pull request instead.

> Hive Distributed Tracing
> 
>
> Key: HIVE-25069
> URL: https://issues.apache.org/jira/browse/HIVE-25069
> Project: Hive
>  Issue Type: New Feature
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25069.01.patch, image-2021-05-10-09-20-54-688.png, 
> image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png
>
>
> Instrument Hive code to gather distributed traces and export trace data to a 
> configurable collector.
> Distributed tracing is a revolutionary tool for debugging issues.
> We will use the new OpenTelemetry open-source standard that our industry has 
> aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
> projects OpenTracing and OpenCensus.
> Next step: Add design document that goes into more detail on the benefits of 
> distributed tracing and describes how Hive will enhanced.
> Also see:
> HBASE-22120 Replace HTrace with OpenTelemetry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=598389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598389
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 18/May/21 01:34
Start Date: 18/May/21 01:34
Worklog Time Spent: 10m 
  Work Description: kishendas commented on pull request #2285:
URL: https://github.com/apache/hive/pull/2285#issuecomment-842756108


   @harishjp Please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598389)
Time Spent: 20m  (was: 10m)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25130:
--
Labels: pull-request-available  (was: )

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=598388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598388
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 18/May/21 01:33
Start Date: 18/May/21 01:33
Worklog Time Spent: 10m 
  Work Description: kishendas opened a new pull request #2285:
URL: https://github.com/apache/hive/pull/2285


   
   ### What changes were proposed in this pull request?
   Fix NullPointerException during alter table concat, after data is inserted 
from Spark. 
   
   
   ### Why are the changes needed?
   To fix NPE.
   
   2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
'java.lang.NullPointerException(null)'
   java.lang.NullPointerException
   at 
org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
   at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added a unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598388)
Remaining Estimate: 0h
Time Spent: 10m

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at 

[jira] [Assigned] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-25130:
-

Assignee: Kishen Das

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25072) Optimise ObjectStore::alterPartitions

2021-05-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-25072:
---

Assignee: Rajesh Balamohan

> Optimise ObjectStore::alterPartitions
> -
>
> Key: HIVE-25072
> URL: https://issues.apache.org/jira/browse/HIVE-25072
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Avoid fetching table details for every partition in the table.
> Ref:
>  
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5104
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4986



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25075) Hive::loadPartitionInternal establishes HMS connection for every partition for external tables

2021-05-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-25075.
-
Fix Version/s: 4.0.0
 Assignee: Rajesh Balamohan
   Resolution: Fixed

Thanks for the review [~aasha] .

I have merged the PR.

> Hive::loadPartitionInternal establishes HMS connection for every partition 
> for external tables
> --
>
> Key: HIVE-25075
> URL: https://issues.apache.org/jira/browse/HIVE-25075
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2522
> {code}
> boolean needRecycle = !tbl.isTemporary()
>   && 
> ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), 
> tbl.getTTable());
> {code}
> Hive.get() breaks the current connection with HMS. Due to this, for external 
> table partition loads, it establishes HMS connection for partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25072) Optimise ObjectStore::alterPartitions

2021-05-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-25072.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Thanks for the review [~aasha] .

I have merged the PR.

> Optimise ObjectStore::alterPartitions
> -
>
> Key: HIVE-25072
> URL: https://issues.apache.org/jira/browse/HIVE-25072
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Avoid fetching table details for every partition in the table.
> Ref:
>  
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5104
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4986



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25075) Hive::loadPartitionInternal establishes HMS connection for every partition for external tables

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25075?focusedWorklogId=598354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598354
 ]

ASF GitHub Bot logged work on HIVE-25075:
-

Author: ASF GitHub Bot
Created on: 17/May/21 23:59
Start Date: 17/May/21 23:59
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #2234:
URL: https://github.com/apache/hive/pull/2234


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598354)
Time Spent: 50m  (was: 40m)

> Hive::loadPartitionInternal establishes HMS connection for every partition 
> for external tables
> --
>
> Key: HIVE-25075
> URL: https://issues.apache.org/jira/browse/HIVE-25075
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2522
> {code}
> boolean needRecycle = !tbl.isTemporary()
>   && 
> ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), 
> tbl.getTTable());
> {code}
> Hive.get() breaks the current connection with HMS. Due to this, for external 
> table partition loads, it establishes HMS connection for partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25075) Hive::loadPartitionInternal establishes HMS connection for every partition for external tables

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25075?focusedWorklogId=598353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598353
 ]

ASF GitHub Bot logged work on HIVE-25075:
-

Author: ASF GitHub Bot
Created on: 17/May/21 23:59
Start Date: 17/May/21 23:59
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #2234:
URL: https://github.com/apache/hive/pull/2234#discussion_r633939545



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3202,7 +3202,7 @@ public void loadTable(Path loadPath, String tableName, 
LoadFileType loadFileType
 //for fullAcid we don't want to delete any files even for OVERWRITE 
see HIVE-14988/HIVE-17361
 boolean isSkipTrash = MetaStoreUtils.isSkipTrash(tbl.getParameters());
 boolean needRecycle = !tbl.isTemporary()
-&& 
ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), 
tbl.getTTable());

Review comment:
   Had to revert the changes in other place. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598353)
Time Spent: 40m  (was: 0.5h)

> Hive::loadPartitionInternal establishes HMS connection for every partition 
> for external tables
> --
>
> Key: HIVE-25075
> URL: https://issues.apache.org/jira/browse/HIVE-25075
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2522
> {code}
> boolean needRecycle = !tbl.isTemporary()
>   && 
> ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), 
> tbl.getTTable());
> {code}
> Hive.get() breaks the current connection with HMS. Due to this, for external 
> table partition loads, it establishes HMS connection for partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25072) Optimise ObjectStore::alterPartitions

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25072?focusedWorklogId=598352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598352
 ]

ASF GitHub Bot logged work on HIVE-25072:
-

Author: ASF GitHub Bot
Created on: 17/May/21 23:56
Start Date: 17/May/21 23:56
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #2235:
URL: https://github.com/apache/hive/pull/2235


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598352)
Time Spent: 0.5h  (was: 20m)

> Optimise ObjectStore::alterPartitions
> -
>
> Key: HIVE-25072
> URL: https://issues.apache.org/jira/browse/HIVE-25072
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Avoid fetching table details for every partition in the table.
> Ref:
>  
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5104
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4986



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-21935) Hive Vectorization : degraded performance with vectorize UDF

2021-05-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-21935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman resolved HIVE-21935.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~pgaref] [~abstractdog] [~rajesh.balamohan] and 
[~gopalv] for reviews.

> Hive Vectorization : degraded performance with vectorize UDF  
> --
>
> Key: HIVE-21935
> URL: https://issues.apache.org/jira/browse/HIVE-21935
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.1
> Environment: Hive-3, JDK-8
>Reporter: Rajkumar Singh
>Assignee: Mustafa İman
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
> Attachments: CustomSplit-1.0-SNAPSHOT.jar
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> with vectorization turned on and hive.vectorized.adaptor.usage.mode=all we 
> were seeing severe performance degradation. looking at the task jstacks it 
> seems that it is running the code which vectorizes UDF and stuck in some loop.
> {code:java}
> jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:573)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ListIndexColScalar.evaluate(ListIndexColScalar.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:889)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> [yarn@hdp32b ~]$ jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureSize(BytesColumnVector.java:554)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:570)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ListIndexColScalar.evaluate(ListIndexColScalar.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:889)
>   at 
> 

[jira] [Work logged] (HIVE-21935) Hive Vectorization : degraded performance with vectorize UDF

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21935?focusedWorklogId=598337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598337
 ]

ASF GitHub Bot logged work on HIVE-21935:
-

Author: ASF GitHub Bot
Created on: 17/May/21 23:38
Start Date: 17/May/21 23:38
Worklog Time Spent: 10m 
  Work Description: mustafaiman closed pull request #2242:
URL: https://github.com/apache/hive/pull/2242


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598337)
Time Spent: 1h 40m  (was: 1.5h)

> Hive Vectorization : degraded performance with vectorize UDF  
> --
>
> Key: HIVE-21935
> URL: https://issues.apache.org/jira/browse/HIVE-21935
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.1
> Environment: Hive-3, JDK-8
>Reporter: Rajkumar Singh
>Assignee: Mustafa İman
>Priority: Major
>  Labels: performance, pull-request-available
> Attachments: CustomSplit-1.0-SNAPSHOT.jar
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> with vectorization turned on and hive.vectorized.adaptor.usage.mode=all we 
> were seeing severe performance degradation. looking at the task jstacks it 
> seems that it is running the code which vectorizes UDF and stuck in some loop.
> {code:java}
> jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:573)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ListIndexColScalar.evaluate(ListIndexColScalar.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:889)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> [yarn@hdp32b ~]$ jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureSize(BytesColumnVector.java:554)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:570)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ListIndexColScalar.evaluate(ListIndexColScalar.java:59)
>   at 
> 

[jira] [Commented] (HIVE-24899) create database event does not include managedLocation URI

2021-05-17 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346473#comment-17346473
 ] 

Vihang Karajgaonkar commented on HIVE-24899:


Looks like this may be have been superseded by HIVE-24175

> create database event does not include managedLocation URI
> --
>
> Key: HIVE-24899
> URL: https://issues.apache.org/jira/browse/HIVE-24899
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Minor
>
> I noticed that when a database is created, Metastore generated Notification 
> event for the database doesn't have the managed location set. If I do a 
> getDatabase call later, metastore returns the managedLocationUri. This seems 
> like a inconsistency and it would be good if the generated event includes the 
> managedLocationUri as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24899) create database event does not include managedLocation URI

2021-05-17 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-24899.

Resolution: Duplicate

> create database event does not include managedLocation URI
> --
>
> Key: HIVE-24899
> URL: https://issues.apache.org/jira/browse/HIVE-24899
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Minor
>
> I noticed that when a database is created, Metastore generated Notification 
> event for the database doesn't have the managed location set. If I do a 
> getDatabase call later, metastore returns the managedLocationUri. This seems 
> like a inconsistency and it would be good if the generated event includes the 
> managedLocationUri as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor edited comment on HIVE-25126 at 5/17/21, 9:48 PM:
-

As I work through this, I see that in {{RawStore}}
{code:java}
 /**
  * 
  * @throws MetaException general database exception
  */

 /**
  * 
  * @throws MetaException something went wrong, usually in the RDBMS or storage
  */ 
{code}
That's not actually true. A "general database exception" is almost never caught 
in {{ObjectStore}} and thrown as a {{MetaException}}. All "general database 
exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and bubble-up 
(and never handled). All the more reason to get rid of this.

 

Also, {{RawStore}} should be storage agnostic, so things like "RDBMS" in a 
comment shouldn't be permissiable


was (Author: belugabehr):
As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  * 
  * @throws MetaException general database exception
  */

 /**
  * 
  * @throws MetaException something went wrong, usually in the RDBMS or storage
  */ 
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346442#comment-17346442
 ] 

David Mollitor commented on HIVE-25126:
---

{{MetaException}} is also inconstantly.  Some functions that clearly access the 
DB do not throw this exception in the signature.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor edited comment on HIVE-25126 at 5/17/21, 9:07 PM:
-

As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  * 
  * @throws MetaException general database exception
  */

 /**
  * 
  * @throws MetaException something went wrong, usually in the RDBMS or storage
  */ 
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.


was (Author: belugabehr):
As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25069) Hive Distributed Tracing

2021-05-17 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25069:

Status: Patch Available  (was: In Progress)

A first Work-in-Progress patch. Work was done on branch-3.1 and manually 
merging changes to master is tedious. The Tracing infrastructure modules are in 
but only a few Hive classes have been merged. Enough though to give Hive QA a 
run. Tracing will exported to a logging-only exporter..

> Hive Distributed Tracing
> 
>
> Key: HIVE-25069
> URL: https://issues.apache.org/jira/browse/HIVE-25069
> Project: Hive
>  Issue Type: New Feature
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25069.01.patch, image-2021-05-10-09-20-54-688.png, 
> image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png
>
>
> Instrument Hive code to gather distributed traces and export trace data to a 
> configurable collector.
> Distributed tracing is a revolutionary tool for debugging issues.
> We will use the new OpenTelemetry open-source standard that our industry has 
> aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
> projects OpenTracing and OpenCensus.
> Next step: Add design document that goes into more detail on the benefits of 
> distributed tracing and describes how Hive will enhanced.
> Also see:
> HBASE-22120 Replace HTrace with OpenTelemetry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25069) Hive Distributed Tracing

2021-05-17 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25069:

Attachment: HIVE-25069.01.patch

> Hive Distributed Tracing
> 
>
> Key: HIVE-25069
> URL: https://issues.apache.org/jira/browse/HIVE-25069
> Project: Hive
>  Issue Type: New Feature
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25069.01.patch, image-2021-05-10-09-20-54-688.png, 
> image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png
>
>
> Instrument Hive code to gather distributed traces and export trace data to a 
> configurable collector.
> Distributed tracing is a revolutionary tool for debugging issues.
> We will use the new OpenTelemetry open-source standard that our industry has 
> aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
> projects OpenTracing and OpenCensus.
> Next step: Add design document that goes into more detail on the benefits of 
> distributed tracing and describes how Hive will enhanced.
> Also see:
> HBASE-22120 Replace HTrace with OpenTelemetry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor edited comment on HIVE-25126 at 5/17/21, 8:55 PM:
-

As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.


was (Author: belugabehr):
As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled)

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor commented on HIVE-25126:
---

As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled)

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift

2021-05-17 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-25129:
--


> Wrong results when timestamps stored in Avro/Parquet fall into the DST shift
> 
>
> Key: HIVE-25129
> URL: https://issues.apache.org/jira/browse/HIVE-25129
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Timestamp values falling into the daylight savings time of the system 
> timezone cannot be retrieved as is when those are stored in Parquet/Avro 
> tables. The respective SELECT query shifts those timestamps by +1 reflecting 
> the DST shift.
> +Example+
> {code:sql}
> --! qt:timezone:US/Pacific
> create table employee (eid int, birthdate timestamp) stored as parquet;
> insert into employee values (0, '2019-03-10 02:00:00');
> insert into employee values (1, '2020-03-08 02:00:00');
> insert into employee values (2, '2021-03-14 02:00:00');
> select eid, birthdate from employee order by eid;{code}
> +Actual results+
> |0|2019-03-10 03:00:00|
> |1|2020-03-08 03:00:00|
> |2|2021-03-14 03:00:00|
> +Expected results+
> |0|2019-03-10 02:00:00|
> |1|2020-03-08 02:00:00|
> |2|2021-03-14 02:00:00|
> Storing and retrieving values in columns using the [timestamp data 
> type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]
>  (equivalent with LocalDateTime java API) should not alter at any way the 
> value that the user is seeing. The results are correct for {{TEXTFILE}} and 
> {{ORC}} tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-17 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25128:
-


> Remove Thrift Exceptions From RawStore alterCatalog
> ---
>
> Key: HIVE-25128
> URL: https://issues.apache.org/jira/browse/HIVE-25128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24532) Reduce sink vectorization mixes column types

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24532?focusedWorklogId=598196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598196
 ]

ASF GitHub Bot logged work on HIVE-24532:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:43
Start Date: 17/May/21 18:43
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #2284:
URL: https://github.com/apache/hive/pull/2284


   Change-Id: Id0705fb5c7d71f6a63d5ec3fb303fd7be90acac1
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598196)
Remaining Estimate: 0h
Time Spent: 10m

> Reduce sink vectorization mixes column types
> 
>
> Key: HIVE-24532
> URL: https://issues.apache.org/jira/browse/HIVE-24532
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
> Attachments: castexception.txt, explainplan.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I do insert overwrite select on a partitioned table. Partition column is 
> specified dynamically from select query. "ceil" function is applied on a 
> string column to specify partition for each row. Reduce sink gets confused 
> about the type of partition column. It leads to following cast exception in 
> runtime:
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializePrimitiveWrite(VectorSerializeRow.java:452)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:279)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:258)
> at 
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkObjectHashOperator.processKey(VectorReduceSinkObjectHashOperator.java:305)
> ... 28 more
> {code}
> The problem is reproducible by running mvn test 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert0.q with "set 
> hive.stats.autogather=false". The additional config option causes insert 
> statements to be vectorized so the vectorization bug appears.
> insert0.q: 
> [https://github.com/apache/hive/blob/fb046c77257d648d0ee232356bdf665772b28bdd/ql/src/test/queries/clientpositive/insert0.q]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24532) Reduce sink vectorization mixes column types

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24532:
--
Labels: pull-request-available  (was: )

> Reduce sink vectorization mixes column types
> 
>
> Key: HIVE-24532
> URL: https://issues.apache.org/jira/browse/HIVE-24532
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
> Attachments: castexception.txt, explainplan.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I do insert overwrite select on a partitioned table. Partition column is 
> specified dynamically from select query. "ceil" function is applied on a 
> string column to specify partition for each row. Reduce sink gets confused 
> about the type of partition column. It leads to following cast exception in 
> runtime:
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializePrimitiveWrite(VectorSerializeRow.java:452)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:279)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:258)
> at 
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkObjectHashOperator.processKey(VectorReduceSinkObjectHashOperator.java:305)
> ... 28 more
> {code}
> The problem is reproducible by running mvn test 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert0.q with "set 
> hive.stats.autogather=false". The additional config option causes insert 
> statements to be vectorized so the vectorization bug appears.
> insert0.q: 
> [https://github.com/apache/hive/blob/fb046c77257d648d0ee232356bdf665772b28bdd/ql/src/test/queries/clientpositive/insert0.q]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25126?focusedWorklogId=598186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598186
 ]

ASF GitHub Bot logged work on HIVE-25126:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:42
Start Date: 17/May/21 18:42
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2283:
URL: https://github.com/apache/hive/pull/2283


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598186)
Remaining Estimate: 0h
Time Spent: 10m

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25126:
--
Labels: pull-request-available  (was: )

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=598172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598172
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:41
Start Date: 17/May/21 18:41
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633751758



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosUsingSnapshots.java
##
@@ -1258,6 +1259,134 @@ private void validateDiffSnapshotsCreated(String 
location) throws Exception {
 dfs.getFileStatus(new Path(locationPath, ".snapshot/" + 
secondSnapshot(primaryDbName.toLowerCase();
   }
 
+  @Test
+  public void testSnapshotsWithFiltersCustomDbLevelPaths() throws Throwable {
+// Directory Structure:
+///prefix/project/   <- Specified as custom Location.(Snapshot Root)
+///randomStuff <- Not to be copied as part of 
external data copy
+///warehouse1 <- To be copied, Contains table1 & 
table2
+//   /warehouse2 <- To be copied, Contains table3 & 
table4
+
+// Create /prefix/project
+Path project = new Path("/" + testName.getMethodName() + "/project");
+DistributedFileSystem fs = primary.miniDFSCluster.getFileSystem();
+fs.mkdirs(project);
+
+// Create /prefix/project/warehouse1
+Path warehouse1 = new Path(project, "warehouse1");
+fs.mkdirs(warehouse1);
+
+// Create /prefix/project/warehouse2
+Path warehouse2 = new Path(project, "warehouse2");
+fs.mkdirs(warehouse2);
+
+// Table1 Path: /prefix/project/warehouse1/table1
+Path table1 = new Path(warehouse1, "table1");
+fs.mkdirs(table1);
+
+// Table2 Path: /prefix/project/warehouse1/table2
+Path table2 = new Path(warehouse1, "table2");
+fs.mkdirs(table2);
+
+// Table3 Path: /prefix/project/warehouse2/table3
+Path table3 = new Path(warehouse2, "table3");
+fs.mkdirs(table3);
+
+// Table4 Path: /prefix/project/warehouse2/table4
+Path table4 = new Path(warehouse2, "table4");
+fs.mkdirs(table4);
+
+// Random Dir inside the /prefix/project
+Path random = new Path(project, "randomStuff");
+fs.mkdirs(random);
+
+fs.create(new Path(random, "file1")).close();
+fs.create(new Path(random, "file2")).close();
+fs.create(new Path(random, "file3")).close();
+
+// Create a filter file for DistCp
+Path filterFile = new Path("/tmp/filter");
+try(FSDataOutputStream stream = fs.create(filterFile)) {
+  stream.writeBytes(".*randomStuff.*");
+}
+assertTrue(fs.exists(filterFile.makeQualified(fs.getUri(), 
fs.getWorkingDirectory(;
+FileWriter myWriter = new FileWriter("/tmp/filter");
+myWriter.write(".*randomStuff.*");
+myWriter.close();
+
+// Specify the project directory as the snapshot root using the single 
copy task path config.
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'"
++ REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK_PATHS.varname + "'='" + 
project
+.makeQualified(fs.getUri(), fs.getWorkingDirectory()).toString() + 
"'");
+
+// Add Filter file
+withClause.add("'distcp.options.filters'='" + "/tmp/filter" + "'");

Review comment:
   Done

##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1197,6 +1241,112 @@ public boolean runDistCp(List srcPaths, Path dst, 
Configuration conf) thro
 }
   }
 
+  @Override
+  public boolean runDistCpWithSnapshots(String oldSnapshot, String 
newSnapshot, List srcPaths, Path dst, Configuration conf)
+  throws IOException {
+DistCpOptions options =
+new DistCpOptions.Builder(srcPaths, 
dst).withSyncFolder(true).withUseDiff(oldSnapshot, newSnapshot)
+
.preserve(FileAttribute.BLOCKSIZE).preserve(FileAttribute.XATTR).build();
+
+List params = constructDistCpWithSnapshotParams(srcPaths, dst, 
oldSnapshot, newSnapshot, conf, "-diff");
+try {
+  conf.setBoolean("mapred.mapper.new-api", true);
+  DistCp distcp = new DistCp(conf, options);
+  int returnCode = distcp.run(params.toArray(new String[0]));
+  if (returnCode == 0) {
+return true;
+  } else if (returnCode == DistCpConstants.INVALID_ARGUMENT) {
+// Handling FileNotFoundException, if source got deleted, in that case 
we don't want to copy either, So it is
+// like a success case, we didn't had anything to copy and we copied 
nothing, so, we need not to fail.
+LOG.warn("Copy failed with INVALID_ARGUMENT for source: {} to target: 
{} snapshot1: {} snapshot2: {} "
++ "params: {}", srcPaths, dst, oldSnapshot, 

[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=598166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598166
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:40
Start Date: 17/May/21 18:40
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633565588



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosUsingSnapshots.java
##
@@ -1258,6 +1259,134 @@ private void validateDiffSnapshotsCreated(String 
location) throws Exception {
 dfs.getFileStatus(new Path(locationPath, ".snapshot/" + 
secondSnapshot(primaryDbName.toLowerCase();
   }
 
+  @Test
+  public void testSnapshotsWithFiltersCustomDbLevelPaths() throws Throwable {
+// Directory Structure:
+///prefix/project/   <- Specified as custom Location.(Snapshot Root)
+///randomStuff <- Not to be copied as part of 
external data copy
+///warehouse1 <- To be copied, Contains table1 & 
table2
+//   /warehouse2 <- To be copied, Contains table3 & 
table4
+
+// Create /prefix/project
+Path project = new Path("/" + testName.getMethodName() + "/project");
+DistributedFileSystem fs = primary.miniDFSCluster.getFileSystem();
+fs.mkdirs(project);
+
+// Create /prefix/project/warehouse1
+Path warehouse1 = new Path(project, "warehouse1");
+fs.mkdirs(warehouse1);
+
+// Create /prefix/project/warehouse2
+Path warehouse2 = new Path(project, "warehouse2");
+fs.mkdirs(warehouse2);
+
+// Table1 Path: /prefix/project/warehouse1/table1
+Path table1 = new Path(warehouse1, "table1");
+fs.mkdirs(table1);
+
+// Table2 Path: /prefix/project/warehouse1/table2
+Path table2 = new Path(warehouse1, "table2");
+fs.mkdirs(table2);
+
+// Table3 Path: /prefix/project/warehouse2/table3
+Path table3 = new Path(warehouse2, "table3");
+fs.mkdirs(table3);
+
+// Table4 Path: /prefix/project/warehouse2/table4
+Path table4 = new Path(warehouse2, "table4");
+fs.mkdirs(table4);
+
+// Random Dir inside the /prefix/project
+Path random = new Path(project, "randomStuff");
+fs.mkdirs(random);
+
+fs.create(new Path(random, "file1")).close();
+fs.create(new Path(random, "file2")).close();
+fs.create(new Path(random, "file3")).close();
+
+// Create a filter file for DistCp
+Path filterFile = new Path("/tmp/filter");
+try(FSDataOutputStream stream = fs.create(filterFile)) {
+  stream.writeBytes(".*randomStuff.*");
+}
+assertTrue(fs.exists(filterFile.makeQualified(fs.getUri(), 
fs.getWorkingDirectory(;
+FileWriter myWriter = new FileWriter("/tmp/filter");
+myWriter.write(".*randomStuff.*");
+myWriter.close();
+
+// Specify the project directory as the snapshot root using the single 
copy task path config.
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'"
++ REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK_PATHS.varname + "'='" + 
project
+.makeQualified(fs.getUri(), fs.getWorkingDirectory()).toString() + 
"'");
+
+// Add Filter file
+withClause.add("'distcp.options.filters'='" + "/tmp/filter" + "'");

Review comment:
   Clean up the filter file after the test

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -675,6 +675,16 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 + " table or partition level. If hive.exec.parallel \n"
 + "is set to true then max worker threads created for copy can be 
hive.exec.parallel.thread.number(determines \n"
 + "number of copy tasks in parallel) * hive.repl.parallel.copy.tasks 
"),
+
REPL_SNAPSHOT_DIFF_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshotdiff.copy",
+false,"Use snapshot diff for copying data from source to "
++ "destination cluster for external table in distcp. If true it uses 
snapshot based distcp for all the paths "
++ "configured as part of hive.repl.external.warehouse.single.copy.task 
along with the external warehouse "
++ "default location."),
+
REPL_SNAPSHOT_OVERWRITE_TARGET_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshot.overwrite.target",

Review comment:
   where are you not taking the custom location paths?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -217,7 +224,12 @@ public int execute() {
   throw e;
 } catch (Exception e) {
   setException(e);
-  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  int errorCode;
+  if 

[jira] [Work logged] (HIVE-23571) [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23571?focusedWorklogId=598149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598149
 ]

ASF GitHub Bot logged work on HIVE-23571:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:38
Start Date: 17/May/21 18:38
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma edited a comment on pull request 
#2128:
URL: https://github.com/apache/hive/pull/2128#issuecomment-841813660


   @kgyrtkirk Could you please review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598149)
Time Spent: 2h  (was: 1h 50m)

> [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper
> --
>
> Key: HIVE-23571
> URL: https://issues.apache.org/jira/browse/HIVE-23571
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add ValidWriteIdList to SharedCache.TableWrapper. This would be used in 
> deciding whether a given read request can be served from the cache or we have 
> to reload it from the backing database. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=598120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598120
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:35
Start Date: 17/May/21 18:35
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2282:
URL: https://github.com/apache/hive/pull/2282#discussion_r633480041



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
##
@@ -523,10 +532,30 @@ private static MessageType getRequestedPrunedSchema(
   configuration, 
HiveConf.ConfVars.HIVE_PARQUET_DATE_PROLEPTIC_GREGORIAN_DEFAULT)));
 }
 
-String legacyConversion = 
ConfVars.HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED.varname;
-if (!metadata.containsKey(legacyConversion)) {
-  metadata.put(legacyConversion, String.valueOf(HiveConf.getBoolVar(
-  configuration, 
HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED)));
+if 
(!metadata.containsKey(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY)) 
{
+  final String legacyConversion;
+  
if(keyValueMetaData.containsKey(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY))
 {
+// If there is meta about the legacy conversion then the file should 
be read in the same way it was written. 
+legacyConversion = 
keyValueMetaData.get(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY);
+  } else 
if(keyValueMetaData.containsKey(DataWritableWriteSupport.WRITER_TIMEZONE)) {
+// If there is no meta about the legacy conversion but there is meta 
about the timezone then we can infer the
+// file was written with the new rules.
+legacyConversion = "false";
+  } else {

Review comment:
   This `if` block makes the life of users in (3.1.2, 3.2.0) a bit easier 
since it determines automatically the appropriate conversion. It looks a bit 
weird though so we could possibly remove it and require from the users in these 
versions to set the respective property accordingly. I would prefer to keep the 
code more uniform than trying to cover edge cases.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
##
@@ -536,7 +542,8 @@ public void write(Object value) {
 Long int64value = ParquetTimestampUtils.getInt64(ts, timeUnit);
 recordConsumer.addLong(int64value);

Review comment:
   The fact that we do not perform/control legacy conversion when we store 
timestamps in INT64 type can create problems if we end up comparing timestamps 
stored as INT96 and INT64. Shall we try to make the new property 
(`hive.parquet.timestamp.write.legacy.conversion.enabled`) affect also the 
INT64 storage type?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598120)
Time Spent: 50m  (was: 40m)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> 

[jira] [Work logged] (HIVE-21935) Hive Vectorization : degraded performance with vectorize UDF

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21935?focusedWorklogId=598101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598101
 ]

ASF GitHub Bot logged work on HIVE-21935:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:32
Start Date: 17/May/21 18:32
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2242:
URL: https://github.com/apache/hive/pull/2242#issuecomment-842075237


   > @abstractdog @rbalamohan are you ok with the latest patch?
   
   
   
   > @abstractdog @rbalamohan are you ok with the latest patch?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598101)
Time Spent: 1.5h  (was: 1h 20m)

> Hive Vectorization : degraded performance with vectorize UDF  
> --
>
> Key: HIVE-21935
> URL: https://issues.apache.org/jira/browse/HIVE-21935
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.1
> Environment: Hive-3, JDK-8
>Reporter: Rajkumar Singh
>Assignee: Mustafa İman
>Priority: Major
>  Labels: performance, pull-request-available
> Attachments: CustomSplit-1.0-SNAPSHOT.jar
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> with vectorization turned on and hive.vectorized.adaptor.usage.mode=all we 
> were seeing severe performance degradation. looking at the task jstacks it 
> seems that it is running the code which vectorizes UDF and stuck in some loop.
> {code:java}
> jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:573)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ListIndexColScalar.evaluate(ListIndexColScalar.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:889)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> [yarn@hdp32b ~]$ jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureSize(BytesColumnVector.java:554)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:570)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> 

[jira] [Work logged] (HIVE-25121) Fix qfile results due to disabling discovery.partitions

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25121?focusedWorklogId=598058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598058
 ]

ASF GitHub Bot logged work on HIVE-25121:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:28
Start Date: 17/May/21 18:28
Worklog Time Spent: 10m 
  Work Description: yongzhi merged pull request #2279:
URL: https://github.com/apache/hive/pull/2279


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598058)
Time Spent: 40m  (was: 0.5h)

> Fix qfile results due to disabling discovery.partitions
> ---
>
> Key: HIVE-25121
> URL: https://issues.apache.org/jira/browse/HIVE-25121
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After the patch for HIVE-25039 is merged, some other tests should be updated 
> as well.
> There are three qfile tests is failing now.
>  # testCliDriver[alter_multi_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[alter_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[create_table_explain_ddl] – 
> org.apache.hadoop.hive.cli.split5.TestMiniLlapLocalCliDriver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?focusedWorklogId=598068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598068
 ]

ASF GitHub Bot logged work on HIVE-25037:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:29
Start Date: 17/May/21 18:29
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on pull request #2199:
URL: https://github.com/apache/hive/pull/2199#issuecomment-842140543


   rebased


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598068)
Time Spent: 1h  (was: 50m)

> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=598046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598046
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:27
Start Date: 17/May/21 18:27
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633107448



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
##
@@ -73,10 +74,10 @@ public OperationLog(String name, File file, HiveConf 
hiveConf) {
   opLoggingLevel = LoggingLevel.UNKNOWN;
 }
 
+isRemoveLogs = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Review comment:
   Done, check the HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED when hive 
not in test. Thanks for the review!

##
File path: 
service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the 
dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is 
closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the 
optimization
+ * and investigating the problem of the operation handily for users or 
administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * The lifecycle of the log is managed by a daemon called {@link 
OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the 
query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no 
operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log 
directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) 
{
+this.operationManager = sessionManager.getOperationManager();
+this.hiveConf = hiveConf;
+this.sessionManager = sessionManager;
+if (HiveConf.getBoolVar(hiveConf, 

[jira] [Work logged] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?focusedWorklogId=598042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598042
 ]

ASF GitHub Bot logged work on HIVE-25107:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:27
Start Date: 17/May/21 18:27
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2271:
URL: https://github.com/apache/hive/pull/2271


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598042)
Time Spent: 50m  (was: 40m)

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24840?focusedWorklogId=598033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598033
 ]

ASF GitHub Bot logged work on HIVE-24840:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:26
Start Date: 17/May/21 18:26
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2280:
URL: https://github.com/apache/hive/pull/2280


   
   
   ### What changes were proposed in this pull request?
   Call `isSourceTablesCompacted` instead of `isSetSourceTablesCompacted` when 
validating materialized views. Latter method only check whether the property is 
null or not.
   
   ### Why are the changes needed?
   `isSetSourceTablesCompacted` method check whether the property value should 
be treated as null values or not and we need the non-null value here.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_4.q 
-pl itests/qtest -Pitests
   mvn test -Dtest=TestMaterializedViewRebuild -pl itests/hive-unit -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598033)
Time Spent: 2h 40m  (was: 2.5h)

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> 
>
> Key: HIVE-24840
> URL: https://issues.apache.org/jira/browse/HIVE-24840
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
> select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=597958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597958
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:16
Start Date: 17/May/21 18:16
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2282:
URL: https://github.com/apache/hive/pull/2282


   ### What changes were proposed in this pull request?
   
   1. Add new read/write config properties to control legacy zone conversions 
in Parquet.
   2. Deprecate hive.parquet.timestamp.legacy.conversion.enabled property since 
it is not clear if it applies on conversion during read or write.
   3. Exploit file metadata and property to choose between new/old conversion 
rules.
   4. Update existing tests to remove usages of now deprecated 
hive.parquet.timestamp.legacy.conversion.enabled property.
   5. Simplify NanoTimeUtils#getTimestamp & NanoTimeUtils#getNanoTime by 
removing 'skipConversion' parameter
   
   ### Why are the changes needed?
   1. Provide the end-users the possibility to write backward compatible 
timestamps in Parquet files so that files can be read correctly by older 
versions.
   2. Improve code readability of NanoTimeUtils APIs.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   1. Add timestamp read/write compatibility test with Hive2 Parquet APIs 
(`TestParquetTimestampsHive2Compatibility`)
   2. Add qtest writing timestamps in Parquet using legacy zone conversions 
(`parquet_int96_legacy_compatibility_timestamp.q`)
   ```
   mvn test -Dtest=*Timestamp*
   cd itests/qtest
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=".*timestamp.*" 
-Dtest.output.overwrite
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597958)
Time Spent: 40m  (was: 0.5h)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=598030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598030
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:25
Start Date: 17/May/21 18:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633483880



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -138,6 +148,17 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
 }
   }
 
+  private void createTableForCTAS(Configuration configuration, Properties 
serDeProperties) {
+serDeProperties.setProperty(TableProperties.ENGINE_HIVE_ENABLED, "true");
+serDeProperties.setProperty(InputFormatConfig.TABLE_SCHEMA, 
SchemaParser.toJson(tableSchema));
+Catalogs.createTable(configuration, serDeProperties);
+// set these in the global conf so that we can rollback the table in the 
lifecycle hook in case of failures

Review comment:
   A good candidate to put something into `QueryInfo`, or somewhere else

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);

Review comment:
   Why? What is blocking us?

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");
+
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' %s TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+testTables.locationForCreateTableSQL(TableIdentifier.of("default", 
"target")),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY id");
+Assert.assertEquals(2, objects.size());
+Assert.assertArrayEquals(new Object[]{1L, "Mike", "HR"}, objects.get(0));
+Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
+  }
+
+  @Test
+  public void testCTASFromDifferentIcebergCatalog() {

Review comment:
   Would it be better placed here 
`TestHiveIcebergStorageHandlerWithMultipleCatalogs`?

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveShell.java
##
@@ -216,6 +216,9 @@ private HiveConf initializeConf() {
 // enables vectorization on Tez
 hiveConf.set("tez.mrreader.config.update.properties", 
"hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids");
 
+// set lifecycle hooks
+hiveConf.setVar(HiveConf.ConfVars.HIVE_QUERY_LIFETIME_HOOKS, 
HiveIcebergCTASHook.class.getName());

Review comment:
   I have seen several hooks already. Maybe we would like to keep them as a 
single class?

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveShell.java
##
@@ -216,6 +216,9 @@ private HiveConf initializeConf() {
 // enables vectorization on Tez
 hiveConf.set("tez.mrreader.config.update.properties", 
"hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids");
 
+// set lifecycle hooks
+hiveConf.setVar(HiveConf.ConfVars.HIVE_QUERY_LIFETIME_HOOKS, 
HiveIcebergCTASHook.class.getName());

Review comment:
   I have seen another Iceberg hook already. Maybe we would like to keep 
them as a single class?
   Maybe even if they are implementing different interfaces? - Just an idea 
which I am playing around

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -361,7 +361,7 @@ private void collectCommitInformation(TezWork work) throws 
IOException, TezExcep
   .filter(name -> 
name.endsWith("HiveIcebergNoJobCommitter")).isPresent();
   // 

[jira] [Work logged] (HIVE-21935) Hive Vectorization : degraded performance with vectorize UDF

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21935?focusedWorklogId=598027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598027
 ]

ASF GitHub Bot logged work on HIVE-21935:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:25
Start Date: 17/May/21 18:25
Worklog Time Spent: 10m 
  Work Description: abstractdog removed a comment on pull request #2242:
URL: https://github.com/apache/hive/pull/2242#issuecomment-842075237


   > @abstractdog @rbalamohan are you ok with the latest patch?
   
   
   
   > @abstractdog @rbalamohan are you ok with the latest patch?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598027)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive Vectorization : degraded performance with vectorize UDF  
> --
>
> Key: HIVE-21935
> URL: https://issues.apache.org/jira/browse/HIVE-21935
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.1
> Environment: Hive-3, JDK-8
>Reporter: Rajkumar Singh
>Assignee: Mustafa İman
>Priority: Major
>  Labels: performance, pull-request-available
> Attachments: CustomSplit-1.0-SNAPSHOT.jar
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> with vectorization turned on and hive.vectorized.adaptor.usage.mode=all we 
> were seeing severe performance degradation. looking at the task jstacks it 
> seems that it is running the code which vectorizes UDF and stuck in some loop.
> {code:java}
> jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:573)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ListIndexColScalar.evaluate(ListIndexColScalar.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:889)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> [yarn@hdp32b ~]$ jstack -l 14954 | grep 0x3af0 -A20
> "TezChild" #15 daemon prio=5 os_prio=0 tid=0x7f157538d800 nid=0x3af0 
> runnable [0x7f1547581000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureSize(BytesColumnVector.java:554)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:570)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>   at 
> 

[jira] [Work logged] (HIVE-25121) Fix qfile results due to disabling discovery.partitions

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25121?focusedWorklogId=597977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597977
 ]

ASF GitHub Bot logged work on HIVE-25121:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:19
Start Date: 17/May/21 18:19
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request #2279:
URL: https://github.com/apache/hive/pull/2279


   
   
   ### What changes were proposed in this pull request?
   
   Fix qfile output
   
   ### Why are the changes needed?
   
   Currently, there are three pre-commit tests is failed after disabling 
discovery.partitions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   No new tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597977)
Time Spent: 0.5h  (was: 20m)

> Fix qfile results due to disabling discovery.partitions
> ---
>
> Key: HIVE-25121
> URL: https://issues.apache.org/jira/browse/HIVE-25121
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After the patch for HIVE-25039 is merged, some other tests should be updated 
> as well.
> There are three qfile tests is failing now.
>  # testCliDriver[alter_multi_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[alter_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[create_table_explain_ddl] – 
> org.apache.hadoop.hive.cli.split5.TestMiniLlapLocalCliDriver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24924) Optimize checkpointing flow in incremental load

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24924?focusedWorklogId=597972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597972
 ]

ASF GitHub Bot logged work on HIVE-24924:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:18
Start Date: 17/May/21 18:18
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2105:
URL: https://github.com/apache/hive/pull/2105#discussion_r633253228



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplTxnTask.java
##
@@ -53,17 +53,6 @@ public int execute() {
 String tableName = work.getTableName();
 ReplicationSpec replicationSpec = work.getReplicationSpec();
 if ((tableName != null) && (replicationSpec != null)) {
-  Table tbl;

Review comment:
   This is checking the replacement info from the table below -
   ```
   tbl = Hive.get().getTable(work.getDbName(), tableName);
   if (!replicationSpec.allowReplacementInto(tbl.getParameters())) {
   ```
   
   But we don't track at table level now, so removed it. Earlier logic was if 
the table is not found in the catch block it use to check at db level
   ```
} catch (InvalidTableException e) {
   // In scenarios like import to mm tables, the alloc write id event 
is generated before create table event.
   try {
 Database database = Hive.get().getDatabase(work.getDbName());
 if 
(!replicationSpec.allowReplacementInto(database.getParameters())) {
   ```
   So, I have kept it to check always at database level




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597972)
Time Spent: 50m  (was: 40m)

> Optimize checkpointing flow in incremental load
> ---
>
> Key: HIVE-24924
> URL: https://issues.apache.org/jira/browse/HIVE-24924
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Attempt reducing alter calls for checkpointing during repl load



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=597970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597970
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:18
Start Date: 17/May/21 18:18
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r633453402



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   what do you think about this @ramesh0201?
   I think a general input col array would be nice (option b) )
   
   however, there some rare cases where it's not obvious which position should 
be used, but it's up to agreement e.g.:
   IfExprScalarColumn.txt
   ```
   protected final int arg1Column;
   protected final  arg2Scalar;
   protected final int arg3Column;
   ```
   this is tricky because there is a scalar interleaved into the columns, input 
col array might look like:
   1. new int[] { arg1Column, -1, arg3Column};
   to emphasize that that the second argument is a scalar, so we'll refactor as:
   ```
   arg3Column => inputColumnNums[2]
   ```
   
   2. new int[] { arg1Column, arg3Column, -1};
   to ignore the fact that there is an interleaved scalar input, so we'll 
refactor as:
   ```
   arg3Column => inputColumnNums[1]
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597970)
Time Spent: 2h 50m  (was: 2h 40m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=597959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597959
 ]

ASF GitHub Bot logged work on HIVE-25079:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:16
Start Date: 17/May/21 18:16
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #2281:
URL: https://github.com/apache/hive/pull/2281


   …anually disabled compaction
   
   
   
   ### What changes were proposed in this pull request?
   
   Creates a new metric that measures the number of writes tables that has 
compaction turned off manually.
   
   ### Why are the changes needed?
   
   Subtask is part of the compaction observability initiative.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597959)
Time Spent: 20m  (was: 10m)

> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?focusedWorklogId=597948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597948
 ]

ASF GitHub Bot logged work on HIVE-25107:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:15
Start Date: 17/May/21 18:15
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2271:
URL: https://github.com/apache/hive/pull/2271#issuecomment-842226170


   merged, thanks for the review @pgaref 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597948)
Time Spent: 40m  (was: 0.5h)

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597912
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:10
Start Date: 17/May/21 18:10
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633722120



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -342,7 +343,12 @@ public static PathFilter getBootstrapDirectoryFilter(final 
FileSystem fs) {
 
   public static int handleException(boolean isReplication, Throwable e, String 
nonRecoverablePath,
 ReplicationMetricCollector 
metricCollector, String stageName, HiveConf conf){
-int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+int errorCode;
+if (isReplication && e instanceof SnapshotException) {
+  errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   Yes, It will be preserved. The entire stack trace is written here. 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java#L353
   
   and the exception is already set above, example:
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java#L141
   
   Every code that calls this sets the exception before and calls this for 
error code to check if it is recoverable or non-recoverable.
   This logic is just to get the error code, which is used to decide whether 
the error is recoverable or non-recoverable.
   
   And the trace is also propagated back, example from `testFailureScenarios()`:
   
   ```
   Caused by: org.apache.hadoop.hdfs.protocol.SnapshotException: Nested 
snapshottable directories not allowed: 
path=/testFailureScenariossource1/tablesource, the ancestor 
/testFailureScenariossource1 is already a snapshottable directory.
   at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.checkNestedSnapshottable(SnapshotManager.java:174)
   at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.setSnapshottable(SnapshotManager.java:189)
   at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.allowSnapshot(FSDirSnapshotOp.java:62)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allowSnapshot(FSNamesystem.java:6366)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.allowSnapshot(NameNodeRpcServer.java:1842)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.allowSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1211)
   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
   
   ```

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -217,7 +224,12 @@ public int execute() {
   throw e;
 } catch (Exception e) {
   setException(e);
-  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  int errorCode;
+  if (e instanceof SnapshotException) {
+errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   Answered here:
   https://github.com/apache/hive/pull/2043#discussion_r633722120

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -675,6 +675,16 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 + " table or partition level. If hive.exec.parallel \n"
 + "is set to true then max worker threads created for copy can be 
hive.exec.parallel.thread.number(determines \n"
 + "number of copy tasks in parallel) * hive.repl.parallel.copy.tasks 
"),
+
REPL_SNAPSHOT_DIFF_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshotdiff.copy",
+false,"Use snapshot diff for copying data from source to "
++ "destination cluster for external table in distcp. If true it uses 
snapshot based distcp for all the paths "
++ "configured as part of hive.repl.external.warehouse.single.copy.task 
along with the external warehouse "
++ "default location."),
+
REPL_SNAPSHOT_OVERWRITE_TARGET_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshot.overwrite.target",

Review comment:
   It is being handled as part of the new config,
   ```
   
REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK_PATHS("hive.repl.external.warehouse.single.copy.task.paths",
   ```




-- 
This is an automated message from the Apache 

[jira] [Updated] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-17 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25127:
--
Summary: Remove Thrift Exceptions From RawStore getCatalogs  (was: Update 
getCatalogs)

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24924) Optimize checkpointing flow in incremental load

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24924?focusedWorklogId=597892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597892
 ]

ASF GitHub Bot logged work on HIVE-24924:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:08
Start Date: 17/May/21 18:08
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2105:
URL: https://github.com/apache/hive/pull/2105#discussion_r633215747



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplTxnTask.java
##
@@ -53,17 +53,6 @@ public int execute() {
 String tableName = work.getTableName();
 ReplicationSpec replicationSpec = work.getReplicationSpec();
 if ((tableName != null) && (replicationSpec != null)) {
-  Table tbl;

Review comment:
   we dont need the check here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597892)
Time Spent: 40m  (was: 0.5h)

> Optimize checkpointing flow in incremental load
> ---
>
> Key: HIVE-24924
> URL: https://issues.apache.org/jira/browse/HIVE-24924
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Attempt reducing alter calls for checkpointing during repl load



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=597889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597889
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:08
Start Date: 17/May/21 18:08
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633086123



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
##
@@ -73,10 +74,10 @@ public OperationLog(String name, File file, HiveConf 
hiveConf) {
   opLoggingLevel = LoggingLevel.UNKNOWN;
 }
 
+isRemoveLogs = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Review comment:
   I would rather keep these configurations independent:
   - `HIVE_TESTING_REMOVE_LOGS`
   - `HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED`
   
   And if any of them are set to true, then do not remove the logs

##
File path: 
service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the 
dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is 
closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the 
optimization
+ * and investigating the problem of the operation handily for users or 
administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * The lifecycle of the log is managed by a daemon called {@link 
OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the 
query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no 
operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log 
directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;

Review comment:
   These are not final. I think we usually try to reserve this naming 
format for `static final` variables

##
File path: 
service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license 

[jira] [Updated] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25126:
--
Description: 
Remove all references to 
NoSuchObjectException/InvalidOperationException/MetaException from the method 
signature of RawStore.  These Exceptions are generated by Thrift and are used 
to communicate error conditions across the wire.  They are not designed for use 
as part of the underlying stack, yet in Hive, they have been pushed down into 
these data access operators. 

 

The RawStore should not have to be this tightly coupled to the transport layer.

 

Remove all checked Exceptions from RawStore in favor of Hive runtime 
exceptions.  This is a popular format and is used (and therefore dovetails 
nicely) with the underlying database access library DataNucleaus.

All of the logging of un-checked Exceptions, and transforming them into Thrift 
exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).

 

The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
assume that the backing data store could really be anything.  With that said, I 
would say there are two phases of this:

 
 # Remove Thrift Exception to decouple from Thrift
 # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus

 

Item number 2 is required because DataNucleaus throws a lot of unchecked 
runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
that many of these exceptions are bubbled up to the caller thus tying the 
caller to handle different exceptions depending on the data source (though they 
only see a {{RawStore}}).  The calling code should only have to deal with Hive 
exceptions and be hidden from the underlying data storage layer.

  was:
Remove all references to 
NoSuchObjectException/InvalidOperationException/MetaException from the method 
signature of RawStore.  These Exceptions are generated by Thrift and are used 
to communicate error conditions across the wire.  They are not designed for use 
as part of the underlying stack, yet in Hive, they have been pushed down into 
these data access operators. 

 

The RawStore should not have to be this tightly coupled to the transport layer.

 

Remove all checked Exceptions from RawStore in favor of Hive runtime 
exceptions.  This is a popular format and is used (and therefore dovetails 
nicely) with the underlying database access library DataNucleaus.

All of the logging of un-checked Exceptions, and transforming them into Thrift 
exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).


> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23931) Send ValidWriteIdList and tableId to get_*_constraints HMS APIs

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23931?focusedWorklogId=597856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597856
 ]

ASF GitHub Bot logged work on HIVE-23931:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:04
Start Date: 17/May/21 18:04
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2211:
URL: https://github.com/apache/hive/pull/2211#discussion_r633090240



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -701,6 +705,8 @@ struct UniqueConstraintsRequest {
   1: required string catName,
   2: required string db_name,
   3: required string tbl_name,
+  4: optional string validWriteIdList,
+  5: optional i64 tableId=-1

Review comment:
   @kgyrtkirk I initially though the same. But doing this change will break 
the metastore api contract and all call from HiveMetaStoreClient.java from 
older clients will not be compatible any more. So that's why instead of 
extracting out the common variable to another struct I added in the same as 
optional.

##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -690,7 +692,9 @@ struct ForeignKeysRequest {
   2: string parent_tbl_name,
   3: string foreign_db_name,
   4: string foreign_tbl_name

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597856)
Time Spent: 1h 40m  (was: 1.5h)

> Send ValidWriteIdList and tableId to get_*_constraints HMS APIs
> ---
>
> Key: HIVE-23931
> URL: https://issues.apache.org/jira/browse/HIVE-23931
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Send ValidWriteIdList and tableId to get_*_constraints HMS APIs. This would 
> be required in order to decide whether the response should be served from the 
> Cache or backing DB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23571) [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23571?focusedWorklogId=597852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597852
 ]

ASF GitHub Bot logged work on HIVE-23571:
-

Author: ASF GitHub Bot
Created on: 17/May/21 18:03
Start Date: 17/May/21 18:03
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2128:
URL: https://github.com/apache/hive/pull/2128#issuecomment-841813660


   @kgyrtkirk Could please review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597852)
Time Spent: 1h 50m  (was: 1h 40m)

> [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper
> --
>
> Key: HIVE-23571
> URL: https://issues.apache.org/jira/browse/HIVE-23571
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add ValidWriteIdList to SharedCache.TableWrapper. This would be used in 
> deciding whether a given read request can be served from the cache or we have 
> to reload it from the backing database. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25127) Update getCatalogs

2021-05-17 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25127:
-


> Update getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597805
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 17:30
Start Date: 17/May/21 17:30
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633723018



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -675,6 +675,16 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 + " table or partition level. If hive.exec.parallel \n"
 + "is set to true then max worker threads created for copy can be 
hive.exec.parallel.thread.number(determines \n"
 + "number of copy tasks in parallel) * hive.repl.parallel.copy.tasks 
"),
+
REPL_SNAPSHOT_DIFF_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshotdiff.copy",
+false,"Use snapshot diff for copying data from source to "
++ "destination cluster for external table in distcp. If true it uses 
snapshot based distcp for all the paths "
++ "configured as part of hive.repl.external.warehouse.single.copy.task 
along with the external warehouse "
++ "default location."),
+
REPL_SNAPSHOT_OVERWRITE_TARGET_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshot.overwrite.target",

Review comment:
   It is being handled as part of the new config,
   ```
   
REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK_PATHS("hive.repl.external.warehouse.single.copy.task.paths",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597805)
Time Spent: 6h  (was: 5h 50m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597802
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 17:29
Start Date: 17/May/21 17:29
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633722120



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -342,7 +343,12 @@ public static PathFilter getBootstrapDirectoryFilter(final 
FileSystem fs) {
 
   public static int handleException(boolean isReplication, Throwable e, String 
nonRecoverablePath,
 ReplicationMetricCollector 
metricCollector, String stageName, HiveConf conf){
-int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+int errorCode;
+if (isReplication && e instanceof SnapshotException) {
+  errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   Yes, It will be preserved. The entire stack trace is written here. 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java#L353
   
   and the exception is already set above, example:
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java#L141
   
   Every code that calls this sets the exception before and calls this for 
error code to check if it is recoverable or non-recoverable.
   This logic is just to get the error code, which is used to decide whether 
the error is recoverable or non-recoverable.
   
   And the trace is also propagated back, example from `testFailureScenarios()`:
   
   ```
   Caused by: org.apache.hadoop.hdfs.protocol.SnapshotException: Nested 
snapshottable directories not allowed: 
path=/testFailureScenariossource1/tablesource, the ancestor 
/testFailureScenariossource1 is already a snapshottable directory.
   at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.checkNestedSnapshottable(SnapshotManager.java:174)
   at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.setSnapshottable(SnapshotManager.java:189)
   at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.allowSnapshot(FSDirSnapshotOp.java:62)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allowSnapshot(FSNamesystem.java:6366)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.allowSnapshot(NameNodeRpcServer.java:1842)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.allowSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1211)
   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597802)
Time Spent: 5h 40m  (was: 5.5h)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597803
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 17:29
Start Date: 17/May/21 17:29
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633722496



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -217,7 +224,12 @@ public int execute() {
   throw e;
 } catch (Exception e) {
   setException(e);
-  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  int errorCode;
+  if (e instanceof SnapshotException) {
+errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   Answered here:
   https://github.com/apache/hive/pull/2043#discussion_r633722120




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597803)
Time Spent: 5h 50m  (was: 5h 40m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-05-17 Thread Thejas Nair (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346279#comment-17346279
 ] 

Thejas Nair commented on HIVE-24920:


For resiliency around issues caused by existing dirs with same name, users 
should specify custom locations for the external tables (or otherwise ensure 
that that default dir doesn’t exist). Other option is to use ACID-Managed 
tables (where you don’t have cases of random dirs)

External tables by definition don’t have data under management of hive. HDFS 
backed one is just one example, other examples include hbase, kudu tables etc.  

 

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25121) Fix qfile results due to disabling discovery.partitions

2021-05-17 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai resolved HIVE-25121.
---
   Fix Version/s: 4.0.0
Target Version/s: 4.0.0
  Resolution: Fixed

> Fix qfile results due to disabling discovery.partitions
> ---
>
> Key: HIVE-25121
> URL: https://issues.apache.org/jira/browse/HIVE-25121
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After the patch for HIVE-25039 is merged, some other tests should be updated 
> as well.
> There are three qfile tests is failing now.
>  # testCliDriver[alter_multi_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[alter_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[create_table_explain_ddl] – 
> org.apache.hadoop.hive.cli.split5.TestMiniLlapLocalCliDriver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25121) Fix qfile results due to disabling discovery.partitions

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25121?focusedWorklogId=597768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597768
 ]

ASF GitHub Bot logged work on HIVE-25121:
-

Author: ASF GitHub Bot
Created on: 17/May/21 16:30
Start Date: 17/May/21 16:30
Worklog Time Spent: 10m 
  Work Description: yongzhi merged pull request #2279:
URL: https://github.com/apache/hive/pull/2279


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597768)
Time Spent: 20m  (was: 10m)

> Fix qfile results due to disabling discovery.partitions
> ---
>
> Key: HIVE-25121
> URL: https://issues.apache.org/jira/browse/HIVE-25121
> Project: Hive
>  Issue Type: Bug
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After the patch for HIVE-25039 is merged, some other tests should be updated 
> as well.
> There are three qfile tests is failing now.
>  # testCliDriver[alter_multi_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[alter_part_table_to_iceberg] – 
> org.apache.hadoop.hive.cli.TestIcebergCliDriver
>  # testCliDriver[create_table_explain_ddl] – 
> org.apache.hadoop.hive.cli.split5.TestMiniLlapLocalCliDriver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-25104:
---
Affects Version/s: (was: 3.1.2)
   3.1.0

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=597739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597739
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 17/May/21 15:39
Start Date: 17/May/21 15:39
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633643380



##
File path: 
service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the 
dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is 
closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the 
optimization
+ * and investigating the problem of the operation handily for users or 
administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * The lifecycle of the log is managed by a daemon called {@link 
OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the 
query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no 
operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log 
directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597739)
Time Spent: 4.5h  (was: 4h 20m)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> 

[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=597737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597737
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 17/May/21 15:39
Start Date: 17/May/21 15:39
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633643084



##
File path: 
service/src/java/org/apache/hive/service/cli/operation/OperationManager.java
##
@@ -85,10 +79,7 @@ public synchronized void init(HiveConf hiveConf) {
 LogDivertAppender.registerRoutingAppender(hiveConf);
 LogDivertAppenderForTest.registerRoutingAppenderIfInTest(hiveConf);
 
-if (hiveConf.isWebUiQueryInfoCacheEnabled()) {
-  historicalQueryInfos = new QueryInfoCache(
-hiveConf.getIntVar(ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES));
-}
+this.queryInfoCache = new QueryInfoCache(hiveConf);

Review comment:
   yes, the cache will only be used when the WebUI is enabled, make it 
Optional here.

##
File path: 
service/src/java/org/apache/hive/service/cli/session/SessionManager.java
##
@@ -281,6 +284,7 @@ private void initOperationLogRootDir() {
 LOG.warn("Failed to schedule cleanup HS2 operation logging root dir: " 
+
 operationLogRootDir.getAbsolutePath(), e);
   }
+  logManager = new OperationLogManager(this, hiveConf);

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597737)
Time Spent: 4h 20m  (was: 4h 10m)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=597735=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597735
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 17/May/21 15:37
Start Date: 17/May/21 15:37
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633642125



##
File path: 
service/src/java/org/apache/hive/service/cli/operation/OperationLogManager.java
##
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.cli.operation;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryInfo;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.OperationLog;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hive.service.cli.OperationHandle;
+import org.apache.hive.service.cli.session.HiveSession;
+import org.apache.hive.service.cli.session.HiveSessionImpl;
+import org.apache.hive.service.cli.session.SessionManager;
+
+/**
+ * Move the operation log into another log location that different from the 
dir created by
+ * {@link HiveSessionImpl#setOperationLogSessionDir(File)},
+ * this will avoid the operation log being cleaned when session/operation is 
closed, refer to
+ * {@link HiveSessionImpl#close()}, so we can get the operation log for the 
optimization
+ * and investigating the problem of the operation handily for users or 
administrators.
+ * The tree under the log location looks like:
+ * - ${@link SessionManager#operationLogRootDir}_historic
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * while the origin tree would like:
+ * - ${@link SessionManager#operationLogRootDir}
+ *- sessionId
+ *- queryId (the operation log file)
+ * 
+ * The lifecycle of the log is managed by a daemon called {@link 
OperationLogDirCleaner},
+ * it gets all query info stored in {@link QueryInfoCache}, searches for the 
query info that can not be reached on the webui,
+ * and removes the log. If the operation log session directory has no 
operation log under it and the session is dead,
+ * then the OperationLogDirCleaner will try to cleanup the session log 
directory.
+ */
+
+public class OperationLogManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(OperationLogManager.class);
+  private static final String HISTORIC_DIR_SUFFIX = "_historic";
+  private static String HISTORIC_OPERATION_LOG_ROOT_DIR;
+  private static long MAX_BYTES_TO_FETCH;
+
+  private final HiveConf hiveConf;
+  private final SessionManager sessionManager;
+  private final OperationManager operationManager;
+  private OperationLogDirCleaner cleaner;
+
+  public OperationLogManager(SessionManager sessionManager, HiveConf hiveConf) 
{
+this.operationManager = sessionManager.getOperationManager();
+this.hiveConf = hiveConf;
+this.sessionManager = sessionManager;
+if (HiveConf.getBoolVar(hiveConf, 
HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED)
+&& 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)
+&& hiveConf.isWebUiQueryInfoCacheEnabled()) {
+  initHistoricOperationLogRootDir();
+  MAX_BYTES_TO_FETCH = HiveConf.getSizeVar(hiveConf,
+  
HiveConf.ConfVars.HIVE_SERVER2_HISTORIC_OPERATION_LOG_FETCH_MAXBYTES);
+  if (HISTORIC_OPERATION_LOG_ROOT_DIR != null
+  && 

[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597689
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 14:28
Start Date: 17/May/21 14:28
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633581183



##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1197,6 +1241,112 @@ public boolean runDistCp(List srcPaths, Path dst, 
Configuration conf) thro
 }
   }
 
+  @Override
+  public boolean runDistCpWithSnapshots(String oldSnapshot, String 
newSnapshot, List srcPaths, Path dst, Configuration conf)
+  throws IOException {
+DistCpOptions options =
+new DistCpOptions.Builder(srcPaths, 
dst).withSyncFolder(true).withUseDiff(oldSnapshot, newSnapshot)
+
.preserve(FileAttribute.BLOCKSIZE).preserve(FileAttribute.XATTR).build();
+
+List params = constructDistCpWithSnapshotParams(srcPaths, dst, 
oldSnapshot, newSnapshot, conf, "-diff");
+try {
+  conf.setBoolean("mapred.mapper.new-api", true);
+  DistCp distcp = new DistCp(conf, options);
+  int returnCode = distcp.run(params.toArray(new String[0]));
+  if (returnCode == 0) {
+return true;
+  } else if (returnCode == DistCpConstants.INVALID_ARGUMENT) {
+// Handling FileNotFoundException, if source got deleted, in that case 
we don't want to copy either, So it is
+// like a success case, we didn't had anything to copy and we copied 
nothing, so, we need not to fail.
+LOG.warn("Copy failed with INVALID_ARGUMENT for source: {} to target: 
{} snapshot1: {} snapshot2: {} "
++ "params: {}", srcPaths, dst, oldSnapshot, newSnapshot, params);
+return true;
+  } else if (returnCode == DistCpConstants.UNKNOWN_ERROR && conf
+  .getBoolean("hive.repl.externaltable.snapshot.overwrite.target", 
true)) {
+// Check if this error is due to target modified.
+if (shouldRdiff(dst, conf, oldSnapshot)) {
+  LOG.warn("Copy failed due to target modified. Attempting to restore 
back the target. source: {} target: {} "
+  + "snapshot: {}", srcPaths, dst, oldSnapshot);
+  List rParams = constructDistCpWithSnapshotParams(srcPaths, 
dst, ".", oldSnapshot, conf, "-rdiff");
+  DistCp rDistcp = new DistCp(conf, options);
+  returnCode = rDistcp.run(rParams.toArray(new String[0]));
+  if (returnCode == 0) {
+LOG.info("Target restored to previous state.  source: {} target: 
{} snapshot: {}. Reattempting to copy.",
+srcPaths, dst, oldSnapshot);
+dst.getFileSystem(conf).deleteSnapshot(dst, oldSnapshot);
+dst.getFileSystem(conf).createSnapshot(dst, oldSnapshot);
+returnCode = distcp.run(params.toArray(new String[0]));
+if (returnCode == 0) {
+  return true;
+} else {
+  LOG.error("Copy failed with after target restore for source: {} 
to target: {} snapshot1: {} snapshot2: "
+  + "{} params: {}. Return code: {}", srcPaths, dst, 
oldSnapshot, newSnapshot, params, returnCode);
+  return false;
+}
+  }
+}
+  }
+} catch (Exception e) {
+  throw new IOException("Cannot execute DistCp process: ", e);
+} finally {
+  conf.setBoolean("mapred.mapper.new-api", false);
+}
+return false;
+  }
+
+  /**
+   * Checks wether reverse diff on the snapshot should be performed or not.
+   * @param p path where snapshot exists.
+   * @param conf the hive configuration.
+   * @param snapshot the name of snapshot.
+   * @return true, if we need to do rdiff.
+   */
+  private static boolean shouldRdiff(Path p, Configuration conf, String 
snapshot) throws Exception {
+// Using the configuration in string form since hive-shims doesn't have a 
dependency on hive-common.
+boolean isOverwrite = 
conf.getBoolean("hive.repl.externaltable.snapshot.overwrite.target", true);

Review comment:
   have to be careful to not modify this constant.can you not pass the 
value of the conf




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597689)
Time Spent: 5.5h  (was: 5h 20m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: 

[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597687
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 14:26
Start Date: 17/May/21 14:26
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633579169



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -342,7 +343,12 @@ public static PathFilter getBootstrapDirectoryFilter(final 
FileSystem fs) {
 
   public static int handleException(boolean isReplication, Throwable e, String 
nonRecoverablePath,
 ReplicationMetricCollector 
metricCollector, String stageName, HiveConf conf){
-int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+int errorCode;
+if (isReplication && e instanceof SnapshotException) {
+  errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   Is the actual error msg retained so that users can check that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597687)
Time Spent: 5h 20m  (was: 5h 10m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597681
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 14:23
Start Date: 17/May/21 14:23
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633576766



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -217,7 +224,12 @@ public int execute() {
   throw e;
 } catch (Exception e) {
   setException(e);
-  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  int errorCode;
+  if (e instanceof SnapshotException) {
+errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   why does snapshot error need to be treated specially?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597681)
Time Spent: 5h 10m  (was: 5h)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=597673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597673
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 17/May/21 14:14
Start Date: 17/May/21 14:14
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r633565588



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosUsingSnapshots.java
##
@@ -1258,6 +1259,134 @@ private void validateDiffSnapshotsCreated(String 
location) throws Exception {
 dfs.getFileStatus(new Path(locationPath, ".snapshot/" + 
secondSnapshot(primaryDbName.toLowerCase();
   }
 
+  @Test
+  public void testSnapshotsWithFiltersCustomDbLevelPaths() throws Throwable {
+// Directory Structure:
+///prefix/project/   <- Specified as custom Location.(Snapshot Root)
+///randomStuff <- Not to be copied as part of 
external data copy
+///warehouse1 <- To be copied, Contains table1 & 
table2
+//   /warehouse2 <- To be copied, Contains table3 & 
table4
+
+// Create /prefix/project
+Path project = new Path("/" + testName.getMethodName() + "/project");
+DistributedFileSystem fs = primary.miniDFSCluster.getFileSystem();
+fs.mkdirs(project);
+
+// Create /prefix/project/warehouse1
+Path warehouse1 = new Path(project, "warehouse1");
+fs.mkdirs(warehouse1);
+
+// Create /prefix/project/warehouse2
+Path warehouse2 = new Path(project, "warehouse2");
+fs.mkdirs(warehouse2);
+
+// Table1 Path: /prefix/project/warehouse1/table1
+Path table1 = new Path(warehouse1, "table1");
+fs.mkdirs(table1);
+
+// Table2 Path: /prefix/project/warehouse1/table2
+Path table2 = new Path(warehouse1, "table2");
+fs.mkdirs(table2);
+
+// Table3 Path: /prefix/project/warehouse2/table3
+Path table3 = new Path(warehouse2, "table3");
+fs.mkdirs(table3);
+
+// Table4 Path: /prefix/project/warehouse2/table4
+Path table4 = new Path(warehouse2, "table4");
+fs.mkdirs(table4);
+
+// Random Dir inside the /prefix/project
+Path random = new Path(project, "randomStuff");
+fs.mkdirs(random);
+
+fs.create(new Path(random, "file1")).close();
+fs.create(new Path(random, "file2")).close();
+fs.create(new Path(random, "file3")).close();
+
+// Create a filter file for DistCp
+Path filterFile = new Path("/tmp/filter");
+try(FSDataOutputStream stream = fs.create(filterFile)) {
+  stream.writeBytes(".*randomStuff.*");
+}
+assertTrue(fs.exists(filterFile.makeQualified(fs.getUri(), 
fs.getWorkingDirectory(;
+FileWriter myWriter = new FileWriter("/tmp/filter");
+myWriter.write(".*randomStuff.*");
+myWriter.close();
+
+// Specify the project directory as the snapshot root using the single 
copy task path config.
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'"
++ REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK_PATHS.varname + "'='" + 
project
+.makeQualified(fs.getUri(), fs.getWorkingDirectory()).toString() + 
"'");
+
+// Add Filter file
+withClause.add("'distcp.options.filters'='" + "/tmp/filter" + "'");

Review comment:
   Clean up the filter file after the test

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -675,6 +675,16 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 + " table or partition level. If hive.exec.parallel \n"
 + "is set to true then max worker threads created for copy can be 
hive.exec.parallel.thread.number(determines \n"
 + "number of copy tasks in parallel) * hive.repl.parallel.copy.tasks 
"),
+
REPL_SNAPSHOT_DIFF_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshotdiff.copy",
+false,"Use snapshot diff for copying data from source to "
++ "destination cluster for external table in distcp. If true it uses 
snapshot based distcp for all the paths "
++ "configured as part of hive.repl.external.warehouse.single.copy.task 
along with the external warehouse "
++ "default location."),
+
REPL_SNAPSHOT_OVERWRITE_TARGET_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshot.overwrite.target",

Review comment:
   where are you not taking the custom location paths?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id:   

[jira] [Assigned] (HIVE-18044) CompactorMR.CompactorOutputCommitter.abortTask() not implemented

2021-05-17 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-18044:


Assignee: Ashish Sharma

> CompactorMR.CompactorOutputCommitter.abortTask() not implemented
> 
>
> Key: HIVE-18044
> URL: https://issues.apache.org/jira/browse/HIVE-18044
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Ashish Sharma
>Priority: Major
>
> Can it explain the following?
> {noformat}
> Exception running child : org.apache.hadoop.fs.FileAlreadyExistsException: 
> /apps/hiv/workmanagement.db/serviceorder_longtext/_tmp_40a7286b-da40-4624-baf3-4de12ec421f4/base_22699743/bucket_6
>  for client 10.1.71.22 already exists 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2671)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2555)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:735)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> {noformat}
> and from yarn app log
> {noformat}
> 2017-11-01 15:44:20,201 FATAL [IPC Server handler 3 on 42141] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1509391924057_1453_m_02_1 - exited : 
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /apps/hive/warehouse/workmanagement.db/serviceorder_longtext/_tmp_e95a96e2-e605-47d9-b878-bb662cd9ece2/base_22490990/bucket_7
>  for client 10.│
> │   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784)
>   
>   
>   
>│
> │   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2671)
>   
>   
>   
> │
> │   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2555)
>   
>   
>   
>│
> │   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:735)
>   
>   
>   
>  │
> │   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
>   
>   
>   │
> │   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   
>   
>  │
> │   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   
>   
>   
>   │
> │   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)   

[jira] [Work started] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state

2021-05-17 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25080 started by Antal Sinkovits.
--
> Create metric about oldest entry in "ready for cleaning" state
> --
>
> Key: HIVE-25080
> URL: https://issues.apache.org/jira/browse/HIVE-25080
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated 
> with the current time. Then the compaction state is set to "ready for 
> cleaning". (... and then the Cleaner runs and the state is set to "succeeded" 
> hopefully)
> Based on this we know (roughly) how long a compaction has been in state 
> "ready for cleaning".
> We should create a metric similar to compaction_oldest_enqueue_age_in_sec 
> that would show that the cleaner is blocked by something i.e. find the 
> compaction in "ready for cleaning" that has the oldest commit time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=597610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597610
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:47
Start Date: 17/May/21 12:47
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2282:
URL: https://github.com/apache/hive/pull/2282#discussion_r633499434



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
##
@@ -536,7 +542,8 @@ public void write(Object value) {
 Long int64value = ParquetTimestampUtils.getInt64(ts, timeUnit);
 recordConsumer.addLong(int64value);

Review comment:
   The fact that we do not perform/control legacy conversion when we store 
timestamps in INT64 type can create problems if we end up comparing timestamps 
stored as INT96 and INT64. Shall we try to make the new property 
(`hive.parquet.timestamp.write.legacy.conversion.enabled`) affect also the 
INT64 storage type?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597610)
Time Spent: 0.5h  (was: 20m)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25122) Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver

2021-05-17 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346134#comment-17346134
 ] 

Peter Vary commented on HIVE-25122:
---

Seems really strange, and nothing to do with the specific tests. The failing 
commands are:
{code:java}
DROP DATABASE IF EXISTS `test_db_smb_mapjoin_7` CASCADE;
USE default;
{code}

So this is most probably not the failing tests themselves, but some infra or 
general test problem with the {{TestBeeLineDriver}} which manifested by this 
failure.

> Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver
> --
>
> Key: HIVE-25122
> URL: https://issues.apache.org/jira/browse/HIVE-25122
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish JP
>Priority: Minor
> Attachments: org.apache.hadoop.hive.cli.TestBeeLineDriver.txt
>
>
> Hive test is failing with error. The build link where it failed: 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2120/4/tests/]
> Error info: [^org.apache.hadoop.hive.cli.TestBeeLineDriver.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597603
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:36
Start Date: 17/May/21 12:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633491193



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   .getMsg(destinationPath.toUri().toString()));
 }
   }
+  // handle direct insert CTAS case
+  // for direct insert CTAS, the table creation DDL is not added to the 
task plan in TaskCompiler,
+  // therefore we need to add the InsertHook here manually so that 
HiveMetaHook#commitInsertTable is called
+  if (qb.isCTAS() && tableDesc != null && tableDesc.getStorageHandler() != 
null) {
+try {
+  if (HiveUtils.getStorageHandler(conf, 
tableDesc.getStorageHandler()).directInsertCTAS()) {
+createPreInsertDesc(destinationTable, false);
+  }
+} catch (HiveException e) {

Review comment:
   When do we get `HiveException`? Why swallow it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597603)
Time Spent: 1.5h  (was: 1h 20m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597601
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:35
Start Date: 17/May/21 12:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633490901



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java
##
@@ -134,6 +135,16 @@ public void compile(final ParseContext pCtx,
 boolean isCStats = pCtx.getQueryProperties().isAnalyzeRewrite();
 int outerQueryLimit = pCtx.getQueryProperties().getOuterQueryLimit();
 
+boolean directInsertCtas = false;
+if (pCtx.getCreateTable() != null && 
pCtx.getCreateTable().getStorageHandler() != null) {
+  try {
+directInsertCtas =
+HiveUtils.getStorageHandler(conf, 
pCtx.getCreateTable().getStorageHandler()).directInsertCTAS();
+  } catch (HiveException e) {

Review comment:
   When do we get `HiveException`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597601)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597600
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:35
Start Date: 17/May/21 12:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633490430



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   .getMsg(destinationPath.toUri().toString()));
 }
   }
+  // handle direct insert CTAS case
+  // for direct insert CTAS, the table creation DDL is not added to the 
task plan in TaskCompiler,
+  // therefore we need to add the InsertHook here manually so that 
HiveMetaHook#commitInsertTable is called

Review comment:
   Where do we add the InsertHook here? I do not really grok the comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597600)
Time Spent: 1h 10m  (was: 1h)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597599
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:33
Start Date: 17/May/21 12:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633489130



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -361,7 +361,7 @@ private void collectCommitInformation(TezWork work) throws 
IOException, TezExcep
   .filter(name -> 
name.endsWith("HiveIcebergNoJobCommitter")).isPresent();
   // we should only consider jobs with Iceberg output committer and a data 
sink
   if (hasIcebergCommitter && !vertex.getDataSinks().isEmpty()) {
-String tableLocationRoot = jobConf.get("location");
+String tableLocationRoot = jobConf.get("iceberg.mr.table.location");

Review comment:
   What is this change?
   Shall we use a constant?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597599)
Time Spent: 1h  (was: 50m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597598
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:32
Start Date: 17/May/21 12:32
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633488104



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveShell.java
##
@@ -216,6 +216,9 @@ private HiveConf initializeConf() {
 // enables vectorization on Tez
 hiveConf.set("tez.mrreader.config.update.properties", 
"hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids");
 
+// set lifecycle hooks
+hiveConf.setVar(HiveConf.ConfVars.HIVE_QUERY_LIFETIME_HOOKS, 
HiveIcebergCTASHook.class.getName());

Review comment:
   I have seen another Iceberg hook already. Maybe we would like to keep 
them as a single class?
   Maybe even if they are implementing different interfaces? - Just an idea 
which I am playing around




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597598)
Time Spent: 50m  (was: 40m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597596
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:31
Start Date: 17/May/21 12:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633488104



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveShell.java
##
@@ -216,6 +216,9 @@ private HiveConf initializeConf() {
 // enables vectorization on Tez
 hiveConf.set("tez.mrreader.config.update.properties", 
"hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids");
 
+// set lifecycle hooks
+hiveConf.setVar(HiveConf.ConfVars.HIVE_QUERY_LIFETIME_HOOKS, 
HiveIcebergCTASHook.class.getName());

Review comment:
   I have seen several hooks already. Maybe we would like to keep them as a 
single class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597596)
Time Spent: 40m  (was: 0.5h)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597591
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:29
Start Date: 17/May/21 12:29
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633486560



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");
+
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' %s TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+testTables.locationForCreateTableSQL(TableIdentifier.of("default", 
"target")),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY id");
+Assert.assertEquals(2, objects.size());
+Assert.assertArrayEquals(new Object[]{1L, "Mike", "HR"}, objects.get(0));
+Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
+  }
+
+  @Test
+  public void testCTASFromDifferentIcebergCatalog() {

Review comment:
   Would it be better placed here 
`TestHiveIcebergStorageHandlerWithMultipleCatalogs`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597591)
Time Spent: 0.5h  (was: 20m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597588
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:26
Start Date: 17/May/21 12:26
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633484552



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);

Review comment:
   Why? What is blocking us?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597588)
Time Spent: 20m  (was: 10m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25034:
--
Labels: pull-request-available  (was: )

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=597587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597587
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:25
Start Date: 17/May/21 12:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r633483880



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -138,6 +148,17 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
 }
   }
 
+  private void createTableForCTAS(Configuration configuration, Properties 
serDeProperties) {
+serDeProperties.setProperty(TableProperties.ENGINE_HIVE_ENABLED, "true");
+serDeProperties.setProperty(InputFormatConfig.TABLE_SCHEMA, 
SchemaParser.toJson(tableSchema));
+Catalogs.createTable(configuration, serDeProperties);
+// set these in the global conf so that we can rollback the table in the 
lifecycle hook in case of failures

Review comment:
   A good candidate to put something into `QueryInfo`, or somewhere else




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597587)
Remaining Estimate: 0h
Time Spent: 10m

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=597581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597581
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:19
Start Date: 17/May/21 12:19
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2282:
URL: https://github.com/apache/hive/pull/2282#discussion_r633480041



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
##
@@ -523,10 +532,30 @@ private static MessageType getRequestedPrunedSchema(
   configuration, 
HiveConf.ConfVars.HIVE_PARQUET_DATE_PROLEPTIC_GREGORIAN_DEFAULT)));
 }
 
-String legacyConversion = 
ConfVars.HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED.varname;
-if (!metadata.containsKey(legacyConversion)) {
-  metadata.put(legacyConversion, String.valueOf(HiveConf.getBoolVar(
-  configuration, 
HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED)));
+if 
(!metadata.containsKey(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY)) 
{
+  final String legacyConversion;
+  
if(keyValueMetaData.containsKey(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY))
 {
+// If there is meta about the legacy conversion then the file should 
be read in the same way it was written. 
+legacyConversion = 
keyValueMetaData.get(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY);
+  } else 
if(keyValueMetaData.containsKey(DataWritableWriteSupport.WRITER_TIMEZONE)) {
+// If there is no meta about the legacy conversion but there is meta 
about the timezone then we can infer the
+// file was written with the new rules.
+legacyConversion = "false";
+  } else {

Review comment:
   This `if` block makes the life of users in (3.1.2, 3.2.0) a bit easier 
since it determines automatically the appropriate conversion. It looks a bit 
weird though so we could possibly remove it and require from the users in these 
versions to set the respective property accordingly. I would prefer to keep the 
code more uniform than trying to cover edge cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597581)
Time Spent: 20m  (was: 10m)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=597577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597577
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 17/May/21 12:07
Start Date: 17/May/21 12:07
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2282:
URL: https://github.com/apache/hive/pull/2282


   ### What changes were proposed in this pull request?
   
   1. Add new read/write config properties to control legacy zone conversions 
in Parquet.
   2. Deprecate hive.parquet.timestamp.legacy.conversion.enabled property since 
it is not clear if it applies on conversion during read or write.
   3. Exploit file metadata and property to choose between new/old conversion 
rules.
   4. Update existing tests to remove usages of now deprecated 
hive.parquet.timestamp.legacy.conversion.enabled property.
   5. Simplify NanoTimeUtils#getTimestamp & NanoTimeUtils#getNanoTime by 
removing 'skipConversion' parameter
   
   ### Why are the changes needed?
   1. Provide the end-users the possibility to write backward compatible 
timestamps in Parquet files so that files can be read correctly by older 
versions.
   2. Improve code readability of NanoTimeUtils APIs.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   1. Add timestamp read/write compatibility test with Hive2 Parquet APIs 
(`TestParquetTimestampsHive2Compatibility`)
   2. Add qtest writing timestamps in Parquet using legacy zone conversions 
(`parquet_int96_legacy_compatibility_timestamp.q`)
   ```
   mvn test -Dtest=*Timestamp*
   cd itests/qtest
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=".*timestamp.*" 
-Dtest.output.overwrite
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597577)
Remaining Estimate: 0h
Time Spent: 10m

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25104:
--
Labels: pull-request-available  (was: )

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=597561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597561
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 17/May/21 11:34
Start Date: 17/May/21 11:34
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r633453402



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   what do you think about this @ramesh0201?
   I think a general input col array would be nice (option b) )
   
   however, there some rare cases where it's not obvious which position should 
be used, but it's up to agreement e.g.:
   IfExprScalarColumn.txt
   ```
   protected final int arg1Column;
   protected final  arg2Scalar;
   protected final int arg3Column;
   ```
   this is tricky because there is a scalar interleaved into the columns, input 
col array might look like:
   1. new int[] { arg1Column, -1, arg3Column};
   to emphasize that that the second argument is a scalar, so we'll refactor as:
   ```
   arg3Column => inputColumnNums[2]
   ```
   
   2. new int[] { arg1Column, arg3Column, -1};
   to ignore the fact that there is an interleaved scalar input, so we'll 
refactor as:
   ```
   arg3Column => inputColumnNums[1]
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597561)
Time Spent: 2h 40m  (was: 2.5h)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346072#comment-17346072
 ] 

László Bodor commented on HIVE-25107:
-

PR merged, thanks for the inputs [~pgaref], [~zmatyus]!

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25107:

Fix Version/s: 4.0.0

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-25107.
-
Resolution: Fixed

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?focusedWorklogId=597543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597543
 ]

ASF GitHub Bot logged work on HIVE-25107:
-

Author: ASF GitHub Bot
Created on: 17/May/21 10:55
Start Date: 17/May/21 10:55
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2271:
URL: https://github.com/apache/hive/pull/2271#issuecomment-842226170


   merged, thanks for the review @pgaref 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597543)
Time Spent: 0.5h  (was: 20m)

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?focusedWorklogId=597542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597542
 ]

ASF GitHub Bot logged work on HIVE-25107:
-

Author: ASF GitHub Bot
Created on: 17/May/21 10:55
Start Date: 17/May/21 10:55
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2271:
URL: https://github.com/apache/hive/pull/2271


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597542)
Time Spent: 20m  (was: 10m)

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=597541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597541
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 17/May/21 10:54
Start Date: 17/May/21 10:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1998:
URL: https://github.com/apache/hive/pull/1998#discussion_r633430432



##
File path: 
service/src/java/org/apache/hive/service/cli/session/SessionManager.java
##
@@ -281,6 +284,7 @@ private void initOperationLogRootDir() {
 LOG.warn("Failed to schedule cleanup HS2 operation logging root dir: " 
+
 operationLogRootDir.getAbsolutePath(), e);
   }
+  logManager = new OperationLogManager(this, hiveConf);

Review comment:
   Do we need the `OperationLogManager` if we do not use async log removal? 
Could we use `Optional` for this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 597541)
Time Spent: 4h  (was: 3h 50m)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >