One thing, timestamp is usually in high cardinality. It is not the right
choice because it causes too many partitions.

2016-12-17 23:34 GMT+09:00 Elliot West <tea...@gmail.com>:

> It looks as though your table is partitioned yet perhaps you haven't
> accounted for this when adding the data? Firstly it is good practice (and
> sometimes essential) to put the data into a partition folder of the form
> "timestamp='<partition value>'". You may then need to add the partition
> depending on how you are creating it. IIRC the Spark DataFrame/DataSet APIs
> have good support for adding partitions to existing Hive tables although
> there was a bug that prevented the creation of new partitioned tables when
> I looked some time ago. If you are manually managing the partitions you may
> need to issue an ADD PARTITION command using the Hive CLI:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#
> LanguageManualDDL-AddPartitions
>
> On Sat, 17 Dec 2016 at 08:07, 446463...@qq.com <446463...@qq.com> wrote:
>
>>
>> Hi All:
>> I create a orc table in hive
>>
>> create table if not exists user_tag (
>> rowkey STRING ,
>> cate1 STRING ,
>> cate2 STRING ,
>> cate3 STRING ,
>> cate4 STRING
>> )
>> PARTITIONED BY (timestamp STRING)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> STORED AS orc
>> LOCATION '/user/hive/warehouse/kylinlabel.db/user_tag';
>>
>> and I generate a orc file in spark  and I put this file into path
>> /user/hive/warehouse/kylinlabel.db/user_tag
>> /user/hive/warehouse/kylinlabel.db/user_tag/part-r-00000-
>> 920282f9-4d68-4af8-81c5-69522df3d374.orc
>> this is the file path.
>> I find there is no data in user_tag table
>> Why?
>>
>>
>> ------------------------------
>>
>> 446463...@qq.com
>>
>>

Reply via email to