Be free to update the document with different opinions. :-) On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi <[email protected]> wrote:
> Hi Alberto, > > Thanks for your comments! In many cases the data is imported to Hadoop in > T+1 mode. Especially when everyday's data is tens of GB, it is > reasonable to partition the Hive table by date. The problem is whether it > worth to keep a long history data in Hive; Usually user only keep a couple > monthes' data in Hive; If the partition number exceeds the threshold in > Hive, he/she can remove the oldest partitions or move to another table > easily; That is a common practice of Hive I think, and it is very good to > know that Hive 2.0 will solve this. > > 2017-01-25 17:10 GMT+08:00 Alberto Ramón <[email protected]>: > >> Be careful about partition by "FLIGHTDATE" >> >> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance >> >> *"Option 1: Use id_date as partition column on Hive table. This have a >> big problem: the Hive metastore is meant for few hundred of partitions not >> thousand (Hive 9452 there is an idea to solve this isn’t in progress)*" >> >> In Hive 2.0 will be a preview (only for testing) to solve this >> >> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <[email protected]>: >> >>> Hello, >>> >>> A new document is added for the practices of cube build. Any suggestion >>> or comment is welcomed. We can update the doc later with feedbacks; >>> >>> Here is the link: >>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
