[jira] [Created] (HIVE-24790) Batch column stats updates to HMS

2021-02-16 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24790:
---

 Summary: Batch column stats updates to HMS
 Key: HIVE-24790
 URL: https://issues.apache.org/jira/browse/HIVE-24790
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan


When large number of partitions are inserted/updated, it would be good to batch 
column statistics updates to HMS.

Currently, HS2 ends up throwing read timeout exception when updating HMS.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L180

 {noformat}
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_252]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
~[?:1.8.0_252]
at java.net.SocketInputStream.read(SocketInputStream.java:171) 
~[?:1.8.0_252]
at java.net.SocketInputStream.read(SocketInputStream.java:141) 
~[?:1.8.0_252]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
~[?:1.8.0_252]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
~[?:1.8.0_252]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
~[?:1.8.0_252]
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 ~[hive-exec-3.1]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
~[hive-exec-3.1]
at 
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374) 
~[hive-exec-3.1]
at 
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451) 
~[hive-exec-3.1]
at 
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433) 
~[hive-exec-3.1]
at 
org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 ~[hive-exec-3.1]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
~[hive-exec-3.1]
at 
org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:62)
 ~[hive-exec-3.1]
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) 
~[hive-exec-3.1]
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) 
~[hive-exec-3.1]
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
 ~[hive-exec-3.1]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) 
~[hive-exec-3.1]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_aggr_stats_for(ThriftHiveMetastore.java:4561)
 ~[hive-exec-3.1]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_aggr_stats_for(ThriftHiveMetastore.java:4548)
 ~[hive-exec-3.1]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:2496)
 ~[hive-exec-3.1]
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:515)
 ~[hive-exec-3.1]
at sun.reflect.GeneratedMethodAccessor194.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_252]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
 ~[hive-exec-3.1]
at com.sun.proxy.$Proxy60.setPartitionColumnStatistics(Unknown Source) 
~[?:?]
at sun.reflect.GeneratedMethodAccessor194.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_252]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3431)
 ~[hive-exec-3.1]
at com.sun.proxy.$Proxy60.setPartitionColumnStatistics(Unknown Source) 
~[?:?]
at 
org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5213)
 ~[hive-exec-3.1]
at 
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:192)
 ~[hive-exec-3.1]
at 
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:87)
 ~[hive-exec-3.1]
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107) 
~[hive-exec-3.1]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
~[hive-exec-3.1]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-3.1]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
~[hive-exec-3.1]

 

[jira] [Created] (HIVE-24789) Unable to verify hive-1.2.2 integrity

2021-02-16 Thread Sebastian (Jira)
Sebastian created HIVE-24789:


 Summary: Unable to verify hive-1.2.2 integrity
 Key: HIVE-24789
 URL: https://issues.apache.org/jira/browse/HIVE-24789
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.2
Reporter: Sebastian


h3. *Steps to reproduce*
Run the following commands -
{noformat}
curl -fSL -o KEYS https://archive.apache.org/dist/hive/KEYS
export HIVE_VERSION=1.2.2
curl -fSL -o hive.tar.gz 
https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz
curl -fSL -o hive.tar.gz.asc 
https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz.asc
export GNUPGHOME="$(mktemp -d)"
gpg --import KEYS
gpg --verify hive.tar.gz.asc hive.tar.gz
{noformat}
h3. *Expected Result*
{noformat}
gpg: Signature made Sun Apr  2 20:16:19 2017 UTC
gpg:using RSA key F2384CC9084FCC30
gpg: Good signature from "Vaibhav Gumashta (CODE SIGNING KEY) 
" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2FF7 1C57 D64C 9AB4 F21A  68D7 F238 4CC9 084F CC30
{noformat}
h3. *Actual Result*
It seems that the file [KEYS|https://archive.apache.org/dist/hive/KEYS] file is 
missing the key *F2384CC9084FCC30*
{noformat}
$ gpg --verify hive.tar.gz.asc hive.tar.gz
gpg: Signature made Sun Apr  2 20:16:19 2017 UTC
gpg:using RSA key F2384CC9084FCC30
gpg: Can't check signature: No public key#   
{noformat}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24788) Backport HIVE-23338 to branch-3.1

2021-02-16 Thread H. Vetinari (Jira)
H. Vetinari created HIVE-24788:
--

 Summary: Backport HIVE-23338 to branch-3.1
 Key: HIVE-24788
 URL: https://issues.apache.org/jira/browse/HIVE-24788
 Project: Hive
  Issue Type: Task
Reporter: H. Vetinari


jackson has a whole bunch of CVEs open against 2.9.x, which makes working with 
HIVE in security aware environments quite difficult.

This has been fixed in HIVE-23338 already, but since 4.0.0 hasn't been released 
yet (and is not on the horizon, as far as I can tell), this should be 
backported to `branch-3.1`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24787) Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488

2021-02-16 Thread Revival Vape (Jira)
Revival Vape created HIVE-24787:
---

 Summary: Hive - upgrade log4j 2.12.1 to 2.13.2+ due to 
CVE-2020-9488
 Key: HIVE-24787
 URL: https://issues.apache.org/jira/browse/HIVE-24787
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Revival Vape
Assignee: Revival Vape


Hive is pulling in log4j 2.12.1 specifically to:
 * ./usr/lib/hive/lib/log4j-core-2.12.1.jar

CVE-2020-9488 affects this version and the fix is to upgrade to 2.13.2+. So, 
upgrade this dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-16 Thread Prasanth Jayachandran (Jira)
Prasanth Jayachandran created HIVE-24786:


 Summary: JDBC HttpClient should retry for idempotent and unsent 
http methods
 Key: HIVE-24786
 URL: https://issues.apache.org/jira/browse/HIVE-24786
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


When hiveserver2 is behind multiple proxies there is possibility of "broken 
pipe", "connect timeout" and "read timeout" exceptions if one of the 
intermediate proxies or load balancers decided to reset the underlying tcp 
socket after idle timeout. When the connection is broken and when the a query 
is submitted after idle timeout from beeline (or client) perspective the 
connection is open but http methods (POST/GET) fails with socket related 
exceptions. Since these methods are not sent to the server these are safe for 
client side retries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24785) Fix HIVE_COMPACTOR_COMPACT_MM property

2021-02-16 Thread Peter Varga (Jira)
Peter Varga created HIVE-24785:
--

 Summary: Fix HIVE_COMPACTOR_COMPACT_MM property
 Key: HIVE-24785
 URL: https://issues.apache.org/jira/browse/HIVE-24785
 Project: Hive
  Issue Type: Bug
Reporter: Peter Varga
Assignee: Peter Varga


Currently it will disable query based compaction for mm tables, but the Worker 
will fall back to MR based compaction which is not implemented for mm tables.
This property should disable compaction in the Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)