I believe there is an issue with non-string type partition values. On some code path point they are incorrectly compared as strings when a numeric comparison should be used instead. Consequently, as '04' ≠ '4' you get two different partitions. To work around this you should ensure that only one numerical partition key format is used: always strip leading zeros. I've had a look in the Hive JIRA and can find no related issues that haven't been fixed for Hive 2.1.0, so perhaps this is a new find. I'd suggest raising a new issue: https://issues.apache.org/jira/browse/HIVE
Thanks, Elliot. On Thu, 22 Dec 2016 at 07:49, 徐 鹏 <xupeng1...@outlook.com> wrote: > > > > > > > > > > > Hi all: > > > > > > > > > > > > > HQL: > > > Alter table OperatingStat_R_View Add IF NOT EXISTS > partition(YEAR=2016,MONTH=12,DAY=04) > > LOCATION > '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'; > > > > INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION > (YEAR=2016,MONTH=12,DAY=04) > > SELECT > > '2016-12-04', > > EventID, > > PlatID, > > ProvinceID, > > CityID, > > RefID, > > '', > > '', > > '', > > COUNT(1), > > COUNT(DISTINCT DeviceID) > > FROM TCRecSys_ApplyData.OperatingStat_D_EventList > > WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01' > > GROUP BY EventID,PlatID,ProvinceID,CityID,RefID; > > > > expected result: > > > > > > > > - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/ > TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’ > > > > actual result: > > > > > > > > - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/ > TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04' > - partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/ > TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4' > > > > desc format info: > > > > > > > hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition > (year=2016,month=12,day=04); > > OK > > # col_name data_type comment > > > > createdate string > > eventid string > > platid string > > provinceid string > > cityid string > > refid string > > def1 string > > def2 string > > def3 string > > pv int > > uv int > > > > # Partition Information > > # col_name data_type comment > > > > year int > > month int > > day int > > > > # Detailed Partition Information > > Partition Value: [2016, 12, 04] > > Database: tcrecsys_applydata > > Table: operatingstat_r_view > > CreateTime: Mon Dec 05 10:46:27 CST 2016 > > LastAccessTime: UNKNOWN > > Protect Mode: None > > Location: > hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04 > > Partition Parameters: > > COLUMN_STATS_ACCURATE false > > numFiles 0 > > numRows -1 > > rawDataSize -1 > > totalSize 0 > > transient_lastDdlTime 1480905987 > > > > # Storage Information > > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > > Compressed: No > > Num Buckets: -1 > > Bucket Columns: [] > > Sort Columns: [] > > Storage Desc Params: > > field.delim ^ > > serialization.format ^ > > Time taken: 0.485 seconds, Fetched: 48 row(s) > > hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition > (year=2016,month=12,day=4); > > OK > > # col_name data_type comment > > > > createdate string > > eventid string > > platid string > > provinceid string > > cityid string > > refid string > > def1 string > > def2 string > > def3 string > > pv int > > uv int > > > > # Partition Information > > # col_name data_type comment > > > > year int > > month int > > day int > > > > # Detailed Partition Information > > Partition Value: [2016, 12, 4] > > Database: tcrecsys_applydata > > Table: operatingstat_r_view > > CreateTime: Mon Dec 05 10:46:07 CST 2016 > > LastAccessTime: UNKNOWN > > Protect Mode: None > > Location: > hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4 > > Partition Parameters: > > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > > numFiles 59 > > numRows 0 > > rawDataSize 0 > > totalSize 49869 > > transient_lastDdlTime 1480906019 > > > > # Storage Information > > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > > Compressed: No > > Num Buckets: -1 > > Bucket Columns: [] > > Sort Columns: [] > > Storage Desc Params: > > field.delim ^ > > serialization.format ^ > > Time taken: 0.087 seconds, Fetched: 48 row(s) > > > > drop the partition, both locations dropped: > > > > > > > hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition > (year=2016,month=12,day=4); > > Dropped the partition year=2016/month=12/day=04 > > Dropped the partition year=2016/month=12/day=4 > > > > > > > > > > > > > > > > now : > > > > > Alter table OperatingStat_R_View Add IF NOT EXISTS > partition(YEAR='2016',MONTH='12',DAY='04') > > LOCATION > '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'; > > > > INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION > (YEAR='2016',MONTH='12',DAY='04') > > SELECT > > '2016-12-04', > > EventID, > > PlatID, > > ProvinceID, > > CityID, > > RefID, > > '', > > '', > > '', > > COUNT(1), > > COUNT(DISTINCT DeviceID) > > FROM TCRecSys_ApplyData.OperatingStat_D_EventList > > WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01' > > GROUP BY EventID,PlatID,ProvinceID,CityID,RefID; > > > > > > > > > > > > > > > > What’s problem? > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > >