In this case, if user runs a query with a WHERE clause that has 2 dimensions from the "aggregation group" & 2 dimensions from the "other 5 dimensions", Kylin will compute the results from the base cuboid, correct? Or would it error out?
I can test it myself but I am being lazy -:) Looking for a quick answer from the experts. Thanks for your help. On Sun, Feb 12, 2017 at 3:04 AM, ShaoFeng Shi <[email protected]> wrote: > Ajay, > > There is no such a setting, but the "aggregation group" has something > similar; say the cube totally has 15 dimensions, but in the agg group you > only pick up 10 dimensions, then Kylin will build totally 1 (base cuboid) + > 2^10 -1 (combinations of the 10 dimensions); Use this way you can leave > those 5 dimension only appear on the base cuboid. > > 2017-02-09 9:20 GMT+08:00 Ajay Chitre <[email protected]>: > >> My question was a general question. Not any specific issue that I am >> encountering -:) >> >> I understand that we can prune by using Hierarchical dimensions, >> aggregation groups etc. But what if these types of aggregations are not >> possible. >> >> Let's say I've 15 dimensions (& I can't prune any), would Kylin build >> 32,766 Cuboids or is there a property to say... "If no. of dimensions are >> over X, stop building more Cuboids. Get from the base"? (Knowing this will >> slow down the queries). >> >> Please let me know. Thanks. >> >> >> On Mon, Feb 6, 2017 at 5:43 AM, ShaoFeng Shi <[email protected]> >> wrote: >> >>> Ajay, thanks for your feedback; >>> >>> For question 1, the code has been merged in master branch; next release >>> would be 2.0; a beta release will be published soon. >>> >>> For question 2, yes your understanding is correct: a N dim FULL cube >>> will have 2^N - 1 cuboids; but if you adopted some way like hierarchy, >>> joint or separating dimensions to multi groups, it will be a "partial" cube >>> which means some cuboids will be pruned. >>> >>> If a query uses dimensions across aggregation groups, then only the base >>> cuboid can fulfill it, kylin has to do the post aggregation from the base >>> cuboid, the performance would be downgraded. Please check whether it's this >>> case in your side. >>> >>> Get Outlook for iOS <https://aka.ms/o0ukef> >>> >>> >>> >>> >>> On Mon, Feb 6, 2017 at 2:05 PM +0900, "Ajay Chitre" < >>> [email protected]> wrote: >>> >>> Thanks for writing this document. It's very helpful. I've following >>>> questions: >>>> >>>> 1) Doc says... "Kylin will build dictionaries in memory (in next >>>> version this will be moved to MR)". >>>> >>>> Which version can we expect this in? For large Cubes this process takes >>>> a long time on local machine. We really need to move this to the Hadoop >>>> cluster. In fact, it will be great if we can have an option to run this >>>> under Spark -:) >>>> >>>> 2) About the "Build N-Dimension Cuboid" step. >>>> >>>> Does Kylin build ALL Cuboids? My understanding is: >>>> >>>> Total no. of Cuboids = (2 to the power of # of dimensions) - 1 >>>> >>>> Correct? >>>> >>>> So if there are 7 dimensions, there will be 127 Cuboids, right? Does >>>> Kylin create ALL of them? >>>> >>>> I was under the impression that, after some point, Kylin will just get >>>> measures from the Base Cuboid; instead of building all of them. Please >>>> explain. >>>> >>>> Thanks. >>>> >>>> >>>> >>>> On Sat, Feb 4, 2017 at 2:19 AM, Li Yang <[email protected]> wrote: >>>> >>>>> Be free to update the document with different opinions. :-) >>>>> >>>>> On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi Alberto, >>>>>> >>>>>> Thanks for your comments! In many cases the data is imported to >>>>>> Hadoop in T+1 mode. Especially when everyday's data is tens of GB, it is >>>>>> reasonable to partition the Hive table by date. The problem is whether it >>>>>> worth to keep a long history data in Hive; Usually user only keep a >>>>>> couple >>>>>> monthes' data in Hive; If the partition number exceeds the threshold in >>>>>> Hive, he/she can remove the oldest partitions or move to another table >>>>>> easily; That is a common practice of Hive I think, and it is very good to >>>>>> know that Hive 2.0 will solve this. >>>>>> >>>>>> 2017-01-25 17:10 GMT+08:00 Alberto Ramón <[email protected]>: >>>>>> >>>>>>> Be careful about partition by "FLIGHTDATE" >>>>>>> >>>>>>> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerfo >>>>>>> rmance >>>>>>> >>>>>>> *"Option 1: Use id_date as partition column on Hive table. This have >>>>>>> a big problem: the Hive metastore is meant for few hundred of partitions >>>>>>> not thousand (Hive 9452 there is an idea to solve this isn’t in >>>>>>> progress)* >>>>>>> " >>>>>>> >>>>>>> In Hive 2.0 will be a preview (only for testing) to solve this >>>>>>> >>>>>>> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <[email protected]>: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> A new document is added for the practices of cube build. Any >>>>>>>> suggestion or comment is welcomed. We can update the doc later with >>>>>>>> feedbacks; >>>>>>>> >>>>>>>> Here is the link: >>>>>>>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Shaofeng Shi 史少锋 >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> >>>>>> Shaofeng Shi 史少锋 >>>>>> >>>>>> >>>>> >>>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
