Re: Use derived or Joint

Alberto Ramón Sat, 10 Dec 2016 04:25:47 -0800

thanks for you clear explanation !!


The only point that I can't understand is
" define "year" - "IDDate" as a hierarchy", which is the benefit ?
      [image: Imágenes integradas 1]
Where:

   -

   IDData is PK of Dim table, Unique & Identity
   - Year is a Normal Dim --> I will have precalculated by years


Nowadays, in derived columns, the Host column, is always the PK of table ?




2016-12-09 15:25 GMT+01:00 ShaoFeng Shi <[email protected]>:

> Hi Albert, I think you're raising a good question; Many users face such
> questions when using Kylin in their cases. Let me try to share some my
> cents.
>
> "Derived" or "Joint" ?
> These are two independent means in Kylin (they're not conflict). Using
> which depends on how these dimensions being used I think;
>
> Take the "IDDate" case you mentioned as an example; If most of you queries
> are aggregated at the PK/FK level (which is date), and user just want to in
> passing other fields like its "MonthTxt", "DayWeekTxt", defining them as
> "derived" will be very good.
>
> But if you also want to aggregate also at "MonthTxt" or "DayWeekTxt"
> level, defining them as "Derived" might not be good; Because Kylin need
> translates the condition of "MonthTxt" into a set of PK values ("IDDate"),
> and then query from Cube with these values, because the cube only
> pre-aggreated at "IDDate"; This will slow down the query; (Ofcause if your
> dataset is small it still be acceptable)
>
> Besides, defining "Year" - "DayWeek_ID" - "Month_ID" as hierarchy is not
> suggested, because they are not a hierarchy relationship, but in parallel
> here;  ("March" is not a child of "2016", it appears in every year)
>
> "Joint" can be used in two typical cases:
> 1) combine multiple ultra low cardinality dimensions
> 2) combine dimensions which has 1:1 (like "Month_ID" "Month_Txt") or close
> to 1:1 relationship (like "USER_ID", "USER_EMAIL")
>
> For case 1, I might design the cube in this way (assume you have the need
> to group by year, month, dayweek):
> 1) define all them as normal dimension
> 2) define "year" - "IDDate" as a hierarchy
> 3) define "Month_ID" + "Month_Txt" as a joint, because they're 1:1
> 4) define "DayWeek_ID" + "DayWeek_Txt" as a joint, because they're 1:1
>
>
> For case 2, I have the same suggestion as above.
>
> 2016-12-09 7:10 GMT+08:00 Alberto Ramón <[email protected]>:
>
>> Typical case 1:
>>
>> *IDDate*
>>
>> *Month_ID*
>>
>> *Month_Txt*
>>
>> *DayWeek_ID*
>>
>> *DayWeek_Txt*
>>
>> *Year*
>>
>> 2016-03-01
>>
>> 3
>>
>> March
>>
>> 2
>>
>> Wendesday
>>
>> 2016
>>
>> 2016-03-02
>>
>> 3
>>
>> March
>>
>> 3
>>
>> Thursday
>>
>> 2016
>>
>> 2016-03-02
>>
>> 3
>>
>> March
>>
>> 4
>>
>> Friday
>>
>> 2016
>>
>> IDDate is PK of Dim table and Unique
>>
>>
>> SOL 1: Uses Hierarchy and Derived from non PK column
>>
>>
>> *Month_ID*
>>
>> Hierarchy 2
>>
>> Normal 1
>>
>> *Month_Txt*
>>
>>
>> Derived 1
>>
>> *DayWeek_ID*
>>
>> Hierarchy 3
>>
>> Normal 2
>>
>> *DayWeek_Txt*
>>
>>
>> Derived 2
>>
>> *Year*
>>
>> Hierarchy 1
>>
>> Normal 3
>>
>> Year > Month > Day
>>
>> Text are derived from ID (in month and Week)
>>
>> PB1: KYLIN-444 <https://issues.apache.org/jira/browse/KYLIN-444>
>>
>> PB2: I don't know how create Derived column from non PK with actual UI (Kylin
>> – 1313 <https://issues.apache.org/jira/browse/KYLIN-1313> v1.5.2 Kylin
>> 1786 <https://issues.apache.org/jira/browse/KYLIN-1786>, v1.5.3)
>>
>>
>>
>> SOL 2:
>>
>> *Month_ID*
>>
>> Hierarchy 2
>>
>> Join 1
>>
>> *Month_Txt*
>>
>>
>> Join 1
>>
>> *DayWeek_ID*
>>
>> Hierarchy 3
>>
>> Join 2
>>
>> *DayWeek_Txt*
>>
>>
>> Join 2
>>
>> *Year*
>>
>> Hierarchy 1
>>
>> Normal 3
>>
>>
>> SOL 2 is this the best solution ??
>>
>>
>>
>> Typical case 2:
>>
>> I see the same scenario a lot of times (derived columns with 1:1 Relation)
>>
>> Product_ID *(PK)*
>>
>> Product_TXT
>>
>> TypeProduct_ID
>>
>> TypeProduct_TXT
>>
>> Country_TXT
>>
>> Country_ID
>>
>> Optimize queries by product / category / country, are mandatory
>>
>> Perhaps,
>>
>> Country (lower cardinality) its a good candidate to Join
>>
>> I don't want put Product_TXT as Join, because is a longgggg text, and can
>> be affect Row_Key of HBase, but I need Queries like ... where product_TXT =
>> ""iRobot Roomba 650 Robotic Vacuum Cleaner
>>
>> suggestions ?
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: Use derived or Joint

Reply via email to