Re: New document: "How to optimize cube build"

Li Yang Sat, 04 Feb 2017 02:20:07 -0800

Be free to update the document with different opinions. :-)

On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi <[email protected]>
wrote:


> Hi Alberto,
>
> Thanks for your comments! In many cases the data is imported to Hadoop in
> T+1 mode. Especially when everyday's data is tens of GB, it is
> reasonable to partition the Hive table by date. The problem is whether it
> worth to keep a long history data in Hive; Usually user only keep a couple
> monthes' data in Hive; If the partition number exceeds the threshold in
> Hive, he/she can remove the oldest partitions or move to another table
> easily; That is a common practice of Hive I think, and it is very good to
> know that Hive 2.0 will solve this.
>
> 2017-01-25 17:10 GMT+08:00 Alberto Ramón <[email protected]>:
>
>> Be careful about partition by "FLIGHTDATE"
>>
>> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
>>
>> *"Option 1: Use id_date as partition column on Hive table. This have a
>> big problem: the Hive metastore is meant for few hundred of partitions not
>> thousand (Hive 9452 there is an idea to solve this isn’t in progress)*"
>>
>> In Hive 2.0 will be a preview (only for testing) to solve this
>>
>> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <[email protected]>:
>>
>>> Hello,
>>>
>>> A new document is added for the practices of cube build. Any suggestion
>>> or comment is welcomed. We can update the doc later with feedbacks;
>>>
>>> Here is the link:
>>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: New document: "How to optimize cube build"

Reply via email to