With such a small dataset, the by-partition scan might be slow than a full
table scan. You can do that on a real big data set, for example, hundreds
of GB;

You can also refer to this post:
http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/

2016-05-13 8:30 GMT+08:00 Mars J <[email protected]>:

> I have test it, create a partition hive table and use the same partition
> column in hive and kylin. but the time consuming of creating flat table
> step is more than didn't use the partition table. in my test, data is very
> small, when not use partition table, it takes 2.31 mins and data size is
> 130.96mb, when use partition table, it takes 3.17 mins and data size is
> 21.62mb(this 2 buiding process has the same start date and different end
> date)
>
> 2016-04-09 15:43 GMT+08:00 ShaoFeng Shi <[email protected]>:
>
>> It is recommended to use the same partition column in hive and kylin,
>> that would gain better performance in generating the flat table step, but
>> this is not required.
>>
>> 2016-04-09 9:36 GMT+08:00 Mars J <[email protected]>:
>>
>>> Hi ,
>>>
>>>        Are hive fact tables and dimensiontal tables should be
>>> date-column partition table when incremental building by date ?
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi
>>
>>
>


-- 
Best regards,

Shaofeng Shi

Reply via email to