[jira] [Created] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol
Matt McCline created HIVE-25191: --- Summary: Modernize Hive Thrift CLI Service Protocol Key: HIVE-25191 URL: https://issues.apache.org/jira/browse/HIVE-25191 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Unnecessary errors are occurring with the advent of proxy use such as Gateways between the Hive client and Hive Server 2. Query failures can be due to arbitrary proxy timeouts. This proposal avoids the timeouts by changing the protocol to do regular polling. Currently, the Hive client uses one request for the query compile request. Long query compile times make those requests vulnerable to the arbitrary proxy timeouts. Another issue is Hive Server 2 sometimes does not notice the client has failed or has lost interest in a potentially long running query. This causes Hive locks and Big Data query resources to be held unnecessarily. The assumption is the client issues a cancel query request when it gets an error. This assumption does not always hold. If the proxy returned an error itself, that proxy may reject the subsequent cancel request, too. And, if the client is killed or the network is down, the client cannot complete a cancel request. The proposed solution here is for Hive Server 2 to watch that the client is sending regular polling requests for status. If a client ceases those requests, then Hive Server 2 will cancel the query. Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more challenging because vendors provide ODBC drivers and Hive does not own the ODBC protocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb
Owen O'Malley created HIVE-25190: Summary: BytesColumnVector fails when the aggregate size is > 1gb Key: HIVE-25190 URL: https://issues.apache.org/jira/browse/HIVE-25190 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), but fail with: {code:java} new RuntimeException("Overflow of newLength. smallBuffer.length=" + smallBuffer.length + ", nextElemLength=" + nextElemLength); {code:java} if the aggregate size of the buffer crosses over 1gb. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS
Steve Carlin created HIVE-25189: --- Summary: Cache the validWriteIdList in query cache before fetching tables from HMS Key: HIVE-25189 URL: https://issues.apache.org/jira/browse/HIVE-25189 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Steve Carlin For a small performance boost at compile time, we should fetch the validWriteIdList before fetching the tables. HMS allows these to be batched together in one call. This will avoid the getTable API from being called twice, because the first time we call it, we pass in a null for validWriteIdList. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
Zhihua Deng created HIVE-25188: -- Summary: JsonSerDe: Unable to read the string value from a nested json Key: HIVE-25188 URL: https://issues.apache.org/jira/browse/HIVE-25188 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 4.0.0 Reporter: Zhihua Deng Assignee: Zhihua Deng Steps to reproduce: create table json_table(data string, messageid string, publish_time bigint, attributes string); if the data of the table stored like: {code:java} {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} Exception will be thrown when trying to deserialize the data: Caused by: java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25187) Reduce number of getPartition calls during loadDynamicPartitions for Managed Tables
Narayanan Venkateswaran created HIVE-25187: -- Summary: Reduce number of getPartition calls during loadDynamicPartitions for Managed Tables Key: HIVE-25187 URL: https://issues.apache.org/jira/browse/HIVE-25187 Project: Hive Issue Type: Bug Components: Hive Reporter: Narayanan Venkateswaran When dynamic partitions are loaded, Hive::loadDynamicPartition loads all partitions from HMS causing heavy load on it. This becomes worse when large number of partitions are present in tables. Only relevant partitions being loaded in dynamic partitions can be queried from HMS for partition existence. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)