[jira] [Created] (HIVE-24473) Update HBase version to 2.1.10

2020-12-02 Thread Istvan Toth (Jira)
Istvan Toth created HIVE-24473:
--

 Summary: Update HBase version to 2.1.10
 Key: HIVE-24473
 URL: https://issues.apache.org/jira/browse/HIVE-24473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 4.0.0
Reporter: Istvan Toth
Assignee: Istvan Toth


Hive currently builds with a 2.0.0 pre-release.

Update HBase to more recent version.

We cannot use anything later than 2.2.4 because of HBASE-22394

So the options are 2.1.10 and 2.2.4

I suggest 2.1.10 because it's a chronologically later release, and it maximises 
compatibility HBase server deployments.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2020-12-02 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24472:
---

 Summary: Optimize LlapTaskSchedulerService::preemptTasksFromMap
 Key: HIVE-24472
 URL: https://issues.apache.org/jira/browse/HIVE-24472
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png

!Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!

speculativeTasks could possibly include node information to reduce CPU burn in 
LlapTaskSchedulerService::preemptTasksFromMap

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-02 Thread mahesh kumar behera (Jira)
mahesh kumar behera created HIVE-24471:
--

 Summary: Add support for combiner in hash mode group aggregation 
 Key: HIVE-24471
 URL: https://issues.apache.org/jira/browse/HIVE-24471
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera


In map side group aggregation, partial grouped aggregation is calculated to 
reduce the data written to disk by map task. In case of hash aggregation, where 
the input data is not sorted, hash table is used. If the hash table size 
increases beyond configurable limit, data is flushed to disk and new hash table 
is generated. If the reduction by hash table is less than min hash aggregation 
reduction calculated during compile time, the map side aggregation is converted 
to streaming mode. So if the first few batch of records does not result into 
significant reduction, then the mode is switched to streaming mode. This may 
have impact on performance, if the subsequent batch of records have less number 
of distinct values. To mitigate this situation, a combiner can be added to the 
map task after the keys are sorted. This will make sure that the aggregation is 
done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-02 Thread Cameron Moberg (Jira)
Cameron Moberg created HIVE-24470:
-

 Summary: Separate HiveMetastore Thrift and Driver logic
 Key: HIVE-24470
 URL: https://issues.apache.org/jira/browse/HIVE-24470
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Cameron Moberg
Assignee: Cameron Moberg


In the file HiveMetastore.java the majority of the code is a thrift interface 
rather than the actual logic behind starting hive metastore, this should be 
moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24469) StatsTask failure while inserting the data into the table partitioned by timestamp

2020-12-02 Thread Rajkumar Singh (Jira)
Rajkumar Singh created HIVE-24469:
-

 Summary: StatsTask failure while inserting the data into the table 
partitioned by timestamp
 Key: HIVE-24469
 URL: https://issues.apache.org/jira/browse/HIVE-24469
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Rajkumar Singh


Steps to repro:


{code:java}
CREATE EXTERNAL TABLE `tblsource`(
  `x` int, 
  `y` string)
STORED AS PARQUET;
CREATE EXTERNAL TABLE `tblinsert`(
  `x` int)
PARTITIONED BY ( 
  `y` timestamp)
STORED AS PARQUET;
insert into table tblsource values (5,'2020-11-06 00:00:00.000');
insert into table tblinsert partition(y) select * from tblsource distribute by 
(y);
{code}

Query fail while executing the stats task and I can see the exception in HMS


{code:java}
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:8629)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:8590)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_232]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_232]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_232]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at com.sun.proxy.$Proxy28.set_aggr_stats_for(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18937)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18921)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_232]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
 ~[hadoop-common-3.1.1.7.2.0.0-237.jar:?]
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
 ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 [hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_232]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_232]
{code}

I think the problem is with timestamp containing all 000 in nano seconds, after 
inserting the value 2020-11-06 00:00:00.000, hive perform set_aggr_stats_for 
and construct the SetPartitionsStatsRequest. during construction of the request 
since nano seconds are all 0 hive FetchOperator convert the 2020-11-06 
00:00:00.000 to 2020-11-06 00:00:00 ( Timestamp.valueOf(string)).

https://github.com/apache/hive/blob/f8aa55f9c8f22c4fd293d9531192f7f46099a420/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L176

on HMS

https://github.com/apache/hive/blob/2ab194d25311e15487ae010b8dd113879ccd501b/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L8626

does not yield any partition as the filter expression for partition was 
2020-11-06 00:00:00 hence it fail with the above mentioned 
IndexOutOfBoundsException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread David Mollitor (Jira)
David Mollitor created HIVE-24468:
-

 Summary: Use Event Time instead of Current Time in Notification 
Log DB Entry
 Key: HIVE-24468
 URL: https://issues.apache.org/jira/browse/HIVE-24468
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)