Re: how to filter long tail data
It's an elegant implementation. I have read the article approximate-topn-measure <http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/> , and some problem meet in our situation 1. The result is approximate. Our team is to supply statistics data for our company, a big company, and we don't want to be challenged by our users 2. There is a little difference to filter data after all cuboid generated. If we have dimension with date、appId、appVersion、channel,measure with dayActiveUseCount、dayNewUseCount、dayUseCount、7dayActiveUseCount, we would filter data which's dayActiveUseCount less than 2 before. It's very hard to use Top-N to implement this , but if using default measure "_COUNT_" to filter data after all cuboid generated, it may be OK. It seems we have to change the souce code, and supply a parameter to filter data by "_COUNT_" after all cuboid generated I have a question for the topN measure: does it also filter data for default measure _COUNT_ which is not in the TopN ? 2017-09-05 15:28 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>: > Cool, that is the case of top N. > > 2017-09-05 12:00 GMT+08:00 杨浩 <yangha...@gmail.com>: > >> Thanks. We would like to try Top-N measure. The "filter condition" filter >> data from the source, but we want to filter the data after all cuboid built >> for we don't know the long tail data unless building. >> >> >> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>: >> >>> Top-N measure is amied to filter the long tail data. Besides, in Data >>> model, there is a "filter condition", where you can add a filtering >>> condition to exclude those tail data. >>> >>> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>: >>> >>>> Okay, our team want to use Kylin as an ETL tool, but there are many >>>> long tail data after building. Can these data be filtered directly by >>>> kylin, or do we have to make some change to the code ? >>>> >>>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>: >>>> >>>>> Please ask Kylin related question here. >>>>> >>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote: >>>>> >>>>> > If a index is less than 2, we don't want to store it in hbase . How >>>>> to >>>>> > filter the long tail data ? >>>>> > >>>>> >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
Re: how to filter long tail data
Cool, that is the case of top N. 2017-09-05 12:00 GMT+08:00 杨浩 <yangha...@gmail.com>: > Thanks. We would like to try Top-N measure. The "filter condition" filter > data from the source, but we want to filter the data after all cuboid built > for we don't know the long tail data unless building. > > > 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>: > >> Top-N measure is amied to filter the long tail data. Besides, in Data >> model, there is a "filter condition", where you can add a filtering >> condition to exclude those tail data. >> >> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>: >> >>> Okay, our team want to use Kylin as an ETL tool, but there are many long >>> tail data after building. Can these data be filtered directly by kylin, or >>> do we have to make some change to the code ? >>> >>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>: >>> >>>> Please ask Kylin related question here. >>>> >>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote: >>>> >>>> > If a index is less than 2, we don't want to store it in hbase . How to >>>> > filter the long tail data ? >>>> > >>>> >>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋
Re: how to filter long tail data
Thanks. We would like to try Top-N measure. The "filter condition" filter data from the source, but we want to filter the data after all cuboid built for we don't know the long tail data unless building. 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>: > Top-N measure is amied to filter the long tail data. Besides, in Data > model, there is a "filter condition", where you can add a filtering > condition to exclude those tail data. > > 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>: > >> Okay, our team want to use Kylin as an ETL tool, but there are many long >> tail data after building. Can these data be filtered directly by kylin, or >> do we have to make some change to the code ? >> >> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>: >> >>> Please ask Kylin related question here. >>> >>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote: >>> >>> > If a index is less than 2, we don't want to store it in hbase . How to >>> > filter the long tail data ? >>> > >>> >> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
Re: how to filter long tail data
Top-N measure is amied to filter the long tail data. Besides, in Data model, there is a "filter condition", where you can add a filtering condition to exclude those tail data. 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>: > Okay, our team want to use Kylin as an ETL tool, but there are many long > tail data after building. Can these data be filtered directly by kylin, or > do we have to make some change to the code ? > > 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>: > >> Please ask Kylin related question here. >> >> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote: >> >> > If a index is less than 2, we don't want to store it in hbase . How to >> > filter the long tail data ? >> > >> > > -- Best regards, Shaofeng Shi 史少锋
Re: how to filter long tail data
Please ask Kylin related question here. On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote: > If a index is less than 2, we don't want to store it in hbase . How to > filter the long tail data ? >
how to filter long tail data
If a index is less than 2, we don't want to store it in hbase . How to filter the long tail data ?