Re: how to filter long tail data

2017-09-06 Thread 杨浩
It's an elegant implementation. I have read the article
approximate-topn-measure
<http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/> , and
some problem meet in our situation

   1. The result is approximate. Our team is to supply statistics data for
   our company, a big company, and we don't want to be challenged by our users
   2. There is a little difference to filter data after all
   cuboid generated. If we have dimension with
   date、appId、appVersion、channel,measure with
   dayActiveUseCount、dayNewUseCount、dayUseCount、7dayActiveUseCount, we would
filter data which's dayActiveUseCount less than 2 before. It's very hard
   to use Top-N to implement this , but if using default measure "_COUNT_"
   to filter data after all cuboid generated, it may be OK.

 It seems we have to change the souce code,  and supply a parameter to
filter data by "_COUNT_" after all cuboid generated


I have a question for the topN measure: does it also filter data for
default measure _COUNT_ which is not in the TopN ?



2017-09-05 15:28 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Cool, that is the case of top N.
>
> 2017-09-05 12:00 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> Thanks. We would like to try Top-N measure. The "filter condition" filter
>> data from the source, but we want to filter the data after all cuboid built
>> for we don't know the long tail data unless building.
>>
>>
>> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
>>
>>> Top-N measure is amied to filter the long tail data. Besides, in Data
>>> model, there is a "filter condition", where you can add a filtering
>>> condition to exclude those tail data.
>>>
>>> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>:
>>>
>>>> Okay, our team want to use Kylin as an ETL tool, but there are many
>>>> long tail data after building. Can these data be filtered directly by
>>>> kylin, or do we have to  make some change to the code ?
>>>>
>>>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>:
>>>>
>>>>> Please ask Kylin related question here.
>>>>>
>>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:
>>>>>
>>>>> > If a index is less than 2, we don't want to store it in hbase . How
>>>>> to
>>>>> > filter the long tail data ?
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: how to filter long tail data

2017-09-05 Thread ShaoFeng Shi
Cool, that is the case of top N.

2017-09-05 12:00 GMT+08:00 杨浩 <yangha...@gmail.com>:

> Thanks. We would like to try Top-N measure. The "filter condition" filter
> data from the source, but we want to filter the data after all cuboid built
> for we don't know the long tail data unless building.
>
>
> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
>
>> Top-N measure is amied to filter the long tail data. Besides, in Data
>> model, there is a "filter condition", where you can add a filtering
>> condition to exclude those tail data.
>>
>> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>:
>>
>>> Okay, our team want to use Kylin as an ETL tool, but there are many long
>>> tail data after building. Can these data be filtered directly by kylin, or
>>> do we have to  make some change to the code ?
>>>
>>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>:
>>>
>>>> Please ask Kylin related question here.
>>>>
>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:
>>>>
>>>> > If a index is less than 2, we don't want to store it in hbase . How to
>>>> > filter the long tail data ?
>>>> >
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋


Re: how to filter long tail data

2017-09-04 Thread 杨浩
Thanks. We would like to try Top-N measure. The "filter condition" filter
data from the source, but we want to filter the data after all cuboid built
for we don't know the long tail data unless building.


2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Top-N measure is amied to filter the long tail data. Besides, in Data
> model, there is a "filter condition", where you can add a filtering
> condition to exclude those tail data.
>
> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> Okay, our team want to use Kylin as an ETL tool, but there are many long
>> tail data after building. Can these data be filtered directly by kylin, or
>> do we have to  make some change to the code ?
>>
>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>:
>>
>>> Please ask Kylin related question here.
>>>
>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:
>>>
>>> > If a index is less than 2, we don't want to store it in hbase . How to
>>> > filter the long tail data ?
>>> >
>>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: how to filter long tail data

2017-09-03 Thread ShaoFeng Shi
Top-N measure is amied to filter the long tail data. Besides, in Data
model, there is a "filter condition", where you can add a filtering
condition to exclude those tail data.

2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>:

> Okay, our team want to use Kylin as an ETL tool, but there are many long
> tail data after building. Can these data be filtered directly by kylin, or
> do we have to  make some change to the code ?
>
> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>:
>
>> Please ask Kylin related question here.
>>
>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:
>>
>> > If a index is less than 2, we don't want to store it in hbase . How to
>> > filter the long tail data ?
>> >
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋


Re: how to filter long tail data

2017-09-03 Thread Li Yang
Please ask Kylin related question here.

On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:

> If a index is less than 2, we don't want to store it in hbase . How to
> filter the long tail data ?
>


how to filter long tail data

2017-09-01 Thread 杨浩
If a index is less than 2, we don't want to store it in hbase . How to
filter the long tail data ?