[jira] [Created] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol

2021-06-02 Thread Matt McCline (Jira)
Matt McCline created HIVE-25191:
---

 Summary: Modernize Hive Thrift CLI Service Protocol
 Key: HIVE-25191
 URL: https://issues.apache.org/jira/browse/HIVE-25191
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


Unnecessary errors are occurring with the advent of proxy use such as Gateways 
between the Hive client and Hive Server 2. Query failures can be due to 
arbitrary proxy timeouts. This proposal avoids the timeouts by changing the 
protocol to do regular polling. Currently, the Hive client uses one request for 
the query compile request. Long query compile times make those requests 
vulnerable to the arbitrary proxy timeouts.

Another issue is Hive Server 2 sometimes does not notice the client has failed 
or has lost interest in a potentially long running query. This causes Hive 
locks and Big Data query resources to be held unnecessarily. The assumption is 
the client issues a cancel query request when it gets an error. This assumption 
does not always hold. If the proxy returned an error itself, that proxy may 
reject the subsequent cancel request, too. And, if the client is killed or the 
network is down, the client cannot complete a cancel request. The proposed 
solution here is for Hive Server 2 to watch that the client is sending regular 
polling requests for status. If a client ceases those requests, then Hive 
Server 2 will cancel the query.

Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more 
challenging because vendors provide ODBC drivers and Hive does not own the ODBC 
protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

2021-06-02 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-25190:


 Summary: BytesColumnVector fails when the aggregate size is > 1gb
 Key: HIVE-25190
 URL: https://issues.apache.org/jira/browse/HIVE-25190
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
but fail with:

{code:java}
new RuntimeException("Overflow of newLength. smallBuffer.length="
+ smallBuffer.length + ", nextElemLength=" + nextElemLength);
{code:java}

if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

2021-06-02 Thread Steve Carlin (Jira)
Steve Carlin created HIVE-25189:
---

 Summary: Cache the validWriteIdList in query cache before fetching 
tables from HMS
 Key: HIVE-25189
 URL: https://issues.apache.org/jira/browse/HIVE-25189
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Steve Carlin


For a small performance boost at compile time, we should fetch the 
validWriteIdList before fetching the tables.  HMS allows these to be batched 
together in one call.  This will avoid the getTable API from being called 
twice, because the first time we call it, we pass in a null for 
validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-25188:
--

 Summary: JsonSerDe: Unable to read the string value from a nested 
json
 Key: HIVE-25188
 URL: https://issues.apache.org/jira/browse/HIVE-25188
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 4.0.0
Reporter: Zhihua Deng
Assignee: Zhihua Deng


Steps to reproduce:
create table json_table(data string, messageid string, publish_time bigint, 
attributes string);
 
if the data of the table stored like:
{code:java}
{"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
Exception will be thrown when trying to deserialize the data:
 
Caused by: java.lang.IllegalArgumentException
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
 at 
org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
 at 
org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
 at 
org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
 at 
org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
 at 
org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
 at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25187) Reduce number of getPartition calls during loadDynamicPartitions for Managed Tables

2021-06-02 Thread Narayanan Venkateswaran (Jira)
Narayanan Venkateswaran created HIVE-25187:
--

 Summary: Reduce number of getPartition calls during 
loadDynamicPartitions for Managed Tables
 Key: HIVE-25187
 URL: https://issues.apache.org/jira/browse/HIVE-25187
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Narayanan Venkateswaran


When dynamic partitions are loaded, Hive::loadDynamicPartition loads all 
partitions from HMS causing heavy load on it. This becomes worse when large 
number of partitions are present in tables.

Only relevant partitions being loaded in dynamic partitions can be queried from 
HMS for partition existence.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)