Re: how to filter long tail data

杨浩 Wed, 06 Sep 2017 18:31:26 -0700

It's an elegant implementation. I have read the article
approximate-topn-measure
<http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/> , and
some problem meet in our situation


   1. The result is approximate. Our team is to supply statistics data for
   our company, a big company, and we don't want to be challenged by our users
   2. There is a little difference to filter data after all
   cuboid generated. If we have dimension with
   date、appId、appVersion、channel，measure with
   dayActiveUseCount、dayNewUseCount、dayUseCount、7dayActiveUseCount， we would
    filter data which's dayActiveUseCount less than 2 before. It's very hard
   to use Top-N to implement this , but if using default measure "_COUNT_"
   to filter data after all cuboid generated, it may be OK.

 It seems we have to change the souce code,  and supply a parameter to
filter data by "_COUNT_" after all cuboid generated


I have a question for the topN measure: does it also filter data for
default measure _COUNT_ which is not in the TopN ?



2017-09-05 15:28 GMT+08:00 ShaoFeng Shi <[email protected]>:

> Cool, that is the case of top N.
>
> 2017-09-05 12:00 GMT+08:00 杨浩 <[email protected]>:
>
>> Thanks. We would like to try Top-N measure. The "filter condition" filter
>> data from the source, but we want to filter the data after all cuboid built
>> for we don't know the long tail data unless building.
>>
>>
>> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <[email protected]>:
>>
>>> Top-N measure is amied to filter the long tail data. Besides, in Data
>>> model, there is a "filter condition", where you can add a filtering
>>> condition to exclude those tail data.
>>>
>>> 2017-09-04 10:54 GMT+08:00 杨浩 <[email protected]>:
>>>
>>>> Okay, our team want to use Kylin as an ETL tool， but there are many
>>>> long tail data after building. Can these data be filtered directly by
>>>> kylin, or do we have to  make some change to the code ?
>>>>
>>>> 2017-09-03 19:42 GMT+08:00 Li Yang <[email protected]>:
>>>>
>>>>> Please ask Kylin related question here.
>>>>>
>>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <[email protected]> wrote:
>>>>>
>>>>> > If a index is less than 2, we don't want to store it in hbase . How
>>>>> to
>>>>> > filter the long tail data ?
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: how to filter long tail data

Reply via email to