Re: Consulting "EXTENDED_COLUMN"

ShaoFeng Shi Sun, 11 Dec 2016 18:32:40 -0800

Kylin will encode the dimension values with Dictionary (default encoding)
or other encoding methods when composing the rowkey; so the overhead will
be less in most of cases.


2016-12-02 17:59 GMT+08:00 Alberto Ramón <[email protected]>:

> yes, I will asume this overhead in rowKey
>
> 2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu <[email protected]>:
>
>> Using Joint Dimension for your 1:1 relation is the right design.
>>
>> 2016-12-02 0:21 GMT+08:00 Alberto Ramón <[email protected]>:
>>
>>> Nice Liu
>>>
>>> We have some cases like
>>> DayWeekTXT , DayWeekID
>>> MonthTXT, MonthID
>>>
>>> small proposal:
>>> Can would be interesting create Derived with 1:1 relation, with support
>>> for filters and Group by
>>>
>>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <[email protected]>:
>>>
>>>> The cost of joint dimension compared with extended column is you have
>>>> more columns in the HBase rowkey. It may harm the query performance. But
>>>> most time, joint dimension is still recommended, since the normal dimension
>>>> column supports much more functions than extended column, such as count(*).
>>>>
>>>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <[email protected]>:
>>>>
>>>>> Hello
>>>>> I was preparing a email with related doubts:
>>>>>
>>>>> Some times we have derived dimensions with relation 1:1, examples:
>>>>> WeekDayID & WeekDayTxt
>>>>> MonthID & WeekTxt
>>>>>
>>>>> SOL1: Derived.  ID as Host and Txt Extended
>>>>> PB: You can't filter / Group by Txt
>>>>>
>>>>> SOL2: Joint. Define tuples of ID & TXT
>>>>> Some PB/limitation?  (I need test this option)
>>>>>
>>>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <[email protected]>:
>>>>>
>>>>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>>>>> used for representation, but not filtering or grouping which is  done by
>>>>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>>>>> key/value map against the HOST_COLUMN.
>>>>>>
>>>>>> If the value in EXTENDED_COLUMN is not long, you could just define
>>>>>> two dimensions with joint dimension setting, it has almost the same
>>>>>> performance impact with EXTENDED_COLUMN which reduces one dimension, but
>>>>>> better understanding.
>>>>>>
>>>>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <[email protected]>:
>>>>>>
>>>>>>> This will help you
>>>>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>>>>
>>>>>>> The idea is always, How I can reduce the number of Dimension ?
>>>>>>> If you reduce Dim, the time / resources to build the cube and final
>>>>>>> size of
>>>>>>> it decrease --> Its good
>>>>>>>
>>>>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>>>>>> .....
>>>>>>>    Id_Person can be HostColumn
>>>>>>>     and other columns can be calculated from ID --> are Extended
>>>>>>> Column
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <[email protected]>:
>>>>>>>
>>>>>>> > Hi ,all
>>>>>>> > I don’t understand the usage scenarios of
>>>>>>> EXTENDED_COLUMN,although I saw
>>>>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>>>>> Column”?
>>>>>>> > Why use this expression,and what aspects of optimization that this
>>>>>>> > expression solved?
>>>>>>> > Can be combined with a SQL statement to explain?
>>>>>>> >
>>>>>>> >
>>>>>>> > Thanks~
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> With Warm regards
>>>>>>
>>>>>> Yiming Liu (刘一鸣)
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> With Warm regards
>>>>
>>>> Yiming Liu (刘一鸣)
>>>>
>>>
>>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Consulting "EXTENDED_COLUMN"

Reply via email to