[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737937#comment-15737937
 ] 

hongbin ma commented on KYLIN-2088:
---

sorry, it should be Yerui :)

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Fix For: v1.6.0
>
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737935#comment-15737935
 ] 

hongbin ma commented on KYLIN-2088:
---

Thanks dayue, the blog is already there: 
http://kylin.apache.org/blog/2016/11/28/intersect-count/

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Fix For: v1.6.0
>
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-18 Thread Yerui Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587815#comment-15587815
 ] 

Yerui Sun commented on KYLIN-2088:
--

Sure, I've written a blog for usage, and planing to post it after this feature 
released.

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-18 Thread Billy(Yiming) Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587489#comment-15587489
 ] 

Billy(Yiming) Liu commented on KYLIN-2088:
--

Hi [~sunyerui], this is a really cool feature. Many users are expecting to have 
a try. Could you prepare a blog to introduce?  

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-17 Thread Yerui Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581690#comment-15581690
 ] 

Yerui Sun commented on KYLIN-2088:
--

I tested the refined code, it worked fine. 
The code looks better and more graceful, thanks, [~liyang.g...@gmail.com].

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-16 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581090#comment-15581090
 ] 

liyang commented on KYLIN-2088:
---

I've done the refactor and merged back to master.

Refactoring is pretty straightforward. MeasureType now support multiple UDAF on 
a type. Also removed the repeated declaration of UDF.

[~sunyerui], please double check on the master branch.

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-13 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574417#comment-15574417
 ] 

liyang commented on KYLIN-2088:
---

[~sunyerui], let me create a branch for KYLIN-2088 and refactor a little there. 
We can review the result and then merge back to master.

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-13 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574410#comment-15574410
 ] 

liyang commented on KYLIN-2088:
---

Right now, the MeasureType extension point represents a combination of 
aggregation function and data type. If follow this design, then two MeasureType 
need to be declared -- COUNT_DISTINCT(BITMAP) and INTERSECT_COUNT(BITMAP).

Also the current UDF is a little duplicated with MeasureType, I may combine 
them into one.

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-13 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574379#comment-15574379
 ] 

liyang commented on KYLIN-2088:
---

This is a case where multiple aggregation functions are defined on one measure 
type. Let me think about it...

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)