[jira] [Created] (HIVE-24473) Update HBase version to 2.1.10
Istvan Toth created HIVE-24473: -- Summary: Update HBase version to 2.1.10 Key: HIVE-24473 URL: https://issues.apache.org/jira/browse/HIVE-24473 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 4.0.0 Reporter: Istvan Toth Assignee: Istvan Toth Hive currently builds with a 2.0.0 pre-release. Update HBase to more recent version. We cannot use anything later than 2.2.4 because of HBASE-22394 So the options are 2.1.10 and 2.2.4 I suggest 2.1.10 because it's a chronologically later release, and it maximises compatibility HBase server deployments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
Rajesh Balamohan created HIVE-24472: --- Summary: Optimize LlapTaskSchedulerService::preemptTasksFromMap Key: HIVE-24472 URL: https://issues.apache.org/jira/browse/HIVE-24472 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571! speculativeTasks could possibly include node information to reduce CPU burn in LlapTaskSchedulerService::preemptTasksFromMap -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24471) Add support for combiner in hash mode group aggregation
mahesh kumar behera created HIVE-24471: -- Summary: Add support for combiner in hash mode group aggregation Key: HIVE-24471 URL: https://issues.apache.org/jira/browse/HIVE-24471 Project: Hive Issue Type: Bug Components: Hive Reporter: mahesh kumar behera Assignee: mahesh kumar behera In map side group aggregation, partial grouped aggregation is calculated to reduce the data written to disk by map task. In case of hash aggregation, where the input data is not sorted, hash table is used. If the hash table size increases beyond configurable limit, data is flushed to disk and new hash table is generated. If the reduction by hash table is less than min hash aggregation reduction calculated during compile time, the map side aggregation is converted to streaming mode. So if the first few batch of records does not result into significant reduction, then the mode is switched to streaming mode. This may have impact on performance, if the subsequent batch of records have less number of distinct values. To mitigate this situation, a combiner can be added to the map task after the keys are sorted. This will make sure that the aggregation is done if possible and reduce the data written to disk. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
Cameron Moberg created HIVE-24470: - Summary: Separate HiveMetastore Thrift and Driver logic Key: HIVE-24470 URL: https://issues.apache.org/jira/browse/HIVE-24470 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Cameron Moberg Assignee: Cameron Moberg In the file HiveMetastore.java the majority of the code is a thrift interface rather than the actual logic behind starting hive metastore, this should be moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24469) StatsTask failure while inserting the data into the table partitioned by timestamp
Rajkumar Singh created HIVE-24469: - Summary: StatsTask failure while inserting the data into the table partitioned by timestamp Key: HIVE-24469 URL: https://issues.apache.org/jira/browse/HIVE-24469 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Rajkumar Singh Steps to repro: {code:java} CREATE EXTERNAL TABLE `tblsource`( `x` int, `y` string) STORED AS PARQUET; CREATE EXTERNAL TABLE `tblinsert`( `x` int) PARTITIONED BY ( `y` timestamp) STORED AS PARQUET; insert into table tblsource values (5,'2020-11-06 00:00:00.000'); insert into table tblinsert partition(y) select * from tblsource distribute by (y); {code} Query fail while executing the stats task and I can see the exception in HMS {code:java} java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:8629) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:8590) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_232] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_232] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_232] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at com.sun.proxy.$Proxy28.set_aggr_stats_for(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18937) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18921) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_232] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) ~[hadoop-common-3.1.1.7.2.0.0-237.jar:?] at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232] {code} I think the problem is with timestamp containing all 000 in nano seconds, after inserting the value 2020-11-06 00:00:00.000, hive perform set_aggr_stats_for and construct the SetPartitionsStatsRequest. during construction of the request since nano seconds are all 0 hive FetchOperator convert the 2020-11-06 00:00:00.000 to 2020-11-06 00:00:00 ( Timestamp.valueOf(string)). https://github.com/apache/hive/blob/f8aa55f9c8f22c4fd293d9531192f7f46099a420/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L176 on HMS https://github.com/apache/hive/blob/2ab194d25311e15487ae010b8dd113879ccd501b/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L8626 does not yield any partition as the filter expression for partition was 2020-11-06 00:00:00 hence it fail with the above mentioned IndexOutOfBoundsException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
David Mollitor created HIVE-24468: - Summary: Use Event Time instead of Current Time in Notification Log DB Entry Key: HIVE-24468 URL: https://issues.apache.org/jira/browse/HIVE-24468 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)