Naveen Gangam created HIVE-22002:
------------------------------------

             Summary: Insert into table partition fails partially with 
stats.autogather is on.
                 Key: HIVE-22002
                 URL: https://issues.apache.org/jira/browse/HIVE-22002
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 4.0.0
            Reporter: Naveen Gangam


create table test_double(id int) partitioned by (dbtest double); 
insert into test_double partition(dbtest) values (1,9.9); --> this works
insert into test_double partition(dbtest) values (1,10); --> this fails 

But if we change it to
insert into test_double partition(dbtest) values (1, cast (10 as double)); it 
succeeds 

-> the problem is only seen when trying to insert a whole number i.e. 10, 10.0, 
15, 14.0 etc. The issue is not seen when inserting a number with decimal values 
other than 0. So insert of 10.1 goes though. 

The underlying exception from the HMS is 
{code}
2019-07-11T07:58:16,670 ERROR [pool-6-thread-196]: server.TThreadPoolServer 
(TThreadPoolServer.java:run(297)) - Error occurred during processing of 
message. java.lang.IndexOutOfBoundsException: Index: 0 at 
java.util.Collections$EmptyList.get(Collections.java:4454) ~[?:1.8.0_112] at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:7808)
 ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7769)
 ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] 
{code}

With {{hive.stats.column.autogather=false}}, this exception does not occur with 
or without the explicit casting.

The issue stems from the fact that HS2 created a partition with value 
{{dbtest=10}} for the table and the stats processor is attempting to add column 
statistics for partition with value {{dbtest=10.0}}. Thus HMS 
{{getPartitionsByNames}} cannot find the partition with that value and thus 
fails to insert the stats. So while the failure initiates on HMS side, the 
cause in the HS2 query planning.

It makes sense that turning off {{hive.stats.column.autogather}} resolves the 
issue because there is no StatsTask in a query plan.

But {{SHOW PARTITIONS}} shows the partition as created while the query planner 
is not including it any plan because of the absence of stats on the partition.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to