Ajay, There is no such a setting, but the "aggregation group" has something similar; say the cube totally has 15 dimensions, but in the agg group you only pick up 10 dimensions, then Kylin will build totally 1 (base cuboid) + 2^10 -1 (combinations of the 10 dimensions); Use this way you can leave those 5 dimension only appear on the base cuboid.
2017-02-09 9:20 GMT+08:00 Ajay Chitre <[email protected]>: > My question was a general question. Not any specific issue that I am > encountering -:) > > I understand that we can prune by using Hierarchical dimensions, > aggregation groups etc. But what if these types of aggregations are not > possible. > > Let's say I've 15 dimensions (& I can't prune any), would Kylin build > 32,766 Cuboids or is there a property to say... "If no. of dimensions are > over X, stop building more Cuboids. Get from the base"? (Knowing this will > slow down the queries). > > Please let me know. Thanks. > > > On Mon, Feb 6, 2017 at 5:43 AM, ShaoFeng Shi <[email protected]> > wrote: > >> Ajay, thanks for your feedback; >> >> For question 1, the code has been merged in master branch; next release >> would be 2.0; a beta release will be published soon. >> >> For question 2, yes your understanding is correct: a N dim FULL cube will >> have 2^N - 1 cuboids; but if you adopted some way like hierarchy, joint or >> separating dimensions to multi groups, it will be a "partial" cube which >> means some cuboids will be pruned. >> >> If a query uses dimensions across aggregation groups, then only the base >> cuboid can fulfill it, kylin has to do the post aggregation from the base >> cuboid, the performance would be downgraded. Please check whether it's this >> case in your side. >> >> Get Outlook for iOS <https://aka.ms/o0ukef> >> >> >> >> >> On Mon, Feb 6, 2017 at 2:05 PM +0900, "Ajay Chitre" < >> [email protected]> wrote: >> >> Thanks for writing this document. It's very helpful. I've following >>> questions: >>> >>> 1) Doc says... "Kylin will build dictionaries in memory (in next version >>> this will be moved to MR)". >>> >>> Which version can we expect this in? For large Cubes this process takes >>> a long time on local machine. We really need to move this to the Hadoop >>> cluster. In fact, it will be great if we can have an option to run this >>> under Spark -:) >>> >>> 2) About the "Build N-Dimension Cuboid" step. >>> >>> Does Kylin build ALL Cuboids? My understanding is: >>> >>> Total no. of Cuboids = (2 to the power of # of dimensions) - 1 >>> >>> Correct? >>> >>> So if there are 7 dimensions, there will be 127 Cuboids, right? Does >>> Kylin create ALL of them? >>> >>> I was under the impression that, after some point, Kylin will just get >>> measures from the Base Cuboid; instead of building all of them. Please >>> explain. >>> >>> Thanks. >>> >>> >>> >>> On Sat, Feb 4, 2017 at 2:19 AM, Li Yang <[email protected]> wrote: >>> >>>> Be free to update the document with different opinions. :-) >>>> >>>> On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi <[email protected]> >>>> wrote: >>>> >>>>> Hi Alberto, >>>>> >>>>> Thanks for your comments! In many cases the data is imported to Hadoop >>>>> in T+1 mode. Especially when everyday's data is tens of GB, it is >>>>> reasonable to partition the Hive table by date. The problem is whether it >>>>> worth to keep a long history data in Hive; Usually user only keep a couple >>>>> monthes' data in Hive; If the partition number exceeds the threshold in >>>>> Hive, he/she can remove the oldest partitions or move to another table >>>>> easily; That is a common practice of Hive I think, and it is very good to >>>>> know that Hive 2.0 will solve this. >>>>> >>>>> 2017-01-25 17:10 GMT+08:00 Alberto Ramón <[email protected]>: >>>>> >>>>>> Be careful about partition by "FLIGHTDATE" >>>>>> >>>>>> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerfo >>>>>> rmance >>>>>> >>>>>> *"Option 1: Use id_date as partition column on Hive table. This have >>>>>> a big problem: the Hive metastore is meant for few hundred of partitions >>>>>> not thousand (Hive 9452 there is an idea to solve this isn’t in >>>>>> progress)* >>>>>> " >>>>>> >>>>>> In Hive 2.0 will be a preview (only for testing) to solve this >>>>>> >>>>>> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <[email protected]>: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> A new document is added for the practices of cube build. Any >>>>>>> suggestion or comment is welcomed. We can update the doc later with >>>>>>> feedbacks; >>>>>>> >>>>>>> Here is the link: >>>>>>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> >>>>>>> Shaofeng Shi 史少锋 >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Shaofeng Shi 史少锋 >>>>> >>>>> >>>> >>> > -- Best regards, Shaofeng Shi 史少锋
