" define "year" - "IDDate" as a hierarchy", which is the benefit ? --> The combination "year" + "IDDate" has the same line number as the combination "IDDate"; so aggregate from the former to the latter will not aggregate much; Then we can prune the later with the "hierarchy" to reduce the cube size;
Nowadays, in derived columns, the Host column, is always the PK of table ? --> Yes 2016-12-10 20:25 GMT+08:00 Alberto Ramón <[email protected]>: > thanks for you clear explanation !! > > > The only point that I can't understand is > " define "year" - "IDDate" as a hierarchy", which is the benefit ? > [image: Imágenes integradas 1] > Where: > > - > > IDData is PK of Dim table, Unique & Identity > - Year is a Normal Dim --> I will have precalculated by years > > > Nowadays, in derived columns, the Host column, is always the PK of table ? > > > > > 2016-12-09 15:25 GMT+01:00 ShaoFeng Shi <[email protected]>: > >> Hi Albert, I think you're raising a good question; Many users face such >> questions when using Kylin in their cases. Let me try to share some my >> cents. >> >> "Derived" or "Joint" ? >> These are two independent means in Kylin (they're not conflict). Using >> which depends on how these dimensions being used I think; >> >> Take the "IDDate" case you mentioned as an example; If most of you >> queries are aggregated at the PK/FK level (which is date), and user just >> want to in passing other fields like its "MonthTxt", "DayWeekTxt", defining >> them as "derived" will be very good. >> >> But if you also want to aggregate also at "MonthTxt" or "DayWeekTxt" >> level, defining them as "Derived" might not be good; Because Kylin need >> translates the condition of "MonthTxt" into a set of PK values ("IDDate"), >> and then query from Cube with these values, because the cube only >> pre-aggreated at "IDDate"; This will slow down the query; (Ofcause if your >> dataset is small it still be acceptable) >> >> Besides, defining "Year" - "DayWeek_ID" - "Month_ID" as hierarchy is not >> suggested, because they are not a hierarchy relationship, but in parallel >> here; ("March" is not a child of "2016", it appears in every year) >> >> "Joint" can be used in two typical cases: >> 1) combine multiple ultra low cardinality dimensions >> 2) combine dimensions which has 1:1 (like "Month_ID" "Month_Txt") or >> close to 1:1 relationship (like "USER_ID", "USER_EMAIL") >> >> For case 1, I might design the cube in this way (assume you have the need >> to group by year, month, dayweek): >> 1) define all them as normal dimension >> 2) define "year" - "IDDate" as a hierarchy >> 3) define "Month_ID" + "Month_Txt" as a joint, because they're 1:1 >> 4) define "DayWeek_ID" + "DayWeek_Txt" as a joint, because they're 1:1 >> >> >> For case 2, I have the same suggestion as above. >> >> 2016-12-09 7:10 GMT+08:00 Alberto Ramón <[email protected]>: >> >>> Typical case 1: >>> >>> *IDDate* >>> >>> *Month_ID* >>> >>> *Month_Txt* >>> >>> *DayWeek_ID* >>> >>> *DayWeek_Txt* >>> >>> *Year* >>> >>> 2016-03-01 >>> >>> 3 >>> >>> March >>> >>> 2 >>> >>> Wendesday >>> >>> 2016 >>> >>> 2016-03-02 >>> >>> 3 >>> >>> March >>> >>> 3 >>> >>> Thursday >>> >>> 2016 >>> >>> 2016-03-02 >>> >>> 3 >>> >>> March >>> >>> 4 >>> >>> Friday >>> >>> 2016 >>> >>> IDDate is PK of Dim table and Unique >>> >>> >>> SOL 1: Uses Hierarchy and Derived from non PK column >>> >>> >>> *Month_ID* >>> >>> Hierarchy 2 >>> >>> Normal 1 >>> >>> *Month_Txt* >>> >>> >>> Derived 1 >>> >>> *DayWeek_ID* >>> >>> Hierarchy 3 >>> >>> Normal 2 >>> >>> *DayWeek_Txt* >>> >>> >>> Derived 2 >>> >>> *Year* >>> >>> Hierarchy 1 >>> >>> Normal 3 >>> >>> Year > Month > Day >>> >>> Text are derived from ID (in month and Week) >>> >>> PB1: KYLIN-444 <https://issues.apache.org/jira/browse/KYLIN-444> >>> >>> PB2: I don't know how create Derived column from non PK with actual UI >>> (Kylin >>> – 1313 <https://issues.apache.org/jira/browse/KYLIN-1313> v1.5.2 Kylin >>> 1786 <https://issues.apache.org/jira/browse/KYLIN-1786>, v1.5.3) >>> >>> >>> >>> SOL 2: >>> >>> *Month_ID* >>> >>> Hierarchy 2 >>> >>> Join 1 >>> >>> *Month_Txt* >>> >>> >>> Join 1 >>> >>> *DayWeek_ID* >>> >>> Hierarchy 3 >>> >>> Join 2 >>> >>> *DayWeek_Txt* >>> >>> >>> Join 2 >>> >>> *Year* >>> >>> Hierarchy 1 >>> >>> Normal 3 >>> >>> >>> SOL 2 is this the best solution ?? >>> >>> >>> >>> Typical case 2: >>> >>> I see the same scenario a lot of times (derived columns with 1:1 >>> Relation) >>> >>> Product_ID *(PK)* >>> >>> Product_TXT >>> >>> TypeProduct_ID >>> >>> TypeProduct_TXT >>> >>> Country_TXT >>> >>> Country_ID >>> >>> Optimize queries by product / category / country, are mandatory >>> >>> Perhaps, >>> >>> Country (lower cardinality) its a good candidate to Join >>> >>> I don't want put Product_TXT as Join, because is a longgggg text, and >>> can be affect Row_Key of HBase, but I need Queries like ... where >>> product_TXT = ""iRobot Roomba 650 Robotic Vacuum Cleaner >>> >>> suggestions ? >>> >> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋
