[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737937#comment-15737937 ] hongbin ma commented on KYLIN-2088: --- sorry, it should be Yerui :) > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Fix For: v1.6.0 > > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737935#comment-15737935 ] hongbin ma commented on KYLIN-2088: --- Thanks dayue, the blog is already there: http://kylin.apache.org/blog/2016/11/28/intersect-count/ > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Fix For: v1.6.0 > > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587815#comment-15587815 ] Yerui Sun commented on KYLIN-2088: -- Sure, I've written a blog for usage, and planing to post it after this feature released. > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587489#comment-15587489 ] Billy(Yiming) Liu commented on KYLIN-2088: -- Hi [~sunyerui], this is a really cool feature. Many users are expecting to have a try. Could you prepare a blog to introduce? > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581690#comment-15581690 ] Yerui Sun commented on KYLIN-2088: -- I tested the refined code, it worked fine. The code looks better and more graceful, thanks, [~liyang.g...@gmail.com]. > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581090#comment-15581090 ] liyang commented on KYLIN-2088: --- I've done the refactor and merged back to master. Refactoring is pretty straightforward. MeasureType now support multiple UDAF on a type. Also removed the repeated declaration of UDF. [~sunyerui], please double check on the master branch. > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574417#comment-15574417 ] liyang commented on KYLIN-2088: --- [~sunyerui], let me create a branch for KYLIN-2088 and refactor a little there. We can review the result and then merge back to master. > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574410#comment-15574410 ] liyang commented on KYLIN-2088: --- Right now, the MeasureType extension point represents a combination of aggregation function and data type. If follow this design, then two MeasureType need to be declared -- COUNT_DISTINCT(BITMAP) and INTERSECT_COUNT(BITMAP). Also the current UDF is a little duplicated with MeasureType, I may combine them into one. > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
[ https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574379#comment-15574379 ] liyang commented on KYLIN-2088: --- This is a case where multiple aggregation functions are defined on one measure type. Let me think about it... > Support intersect count for calculation of retention or conversion rates > > > Key: KYLIN-2088 > URL: https://issues.apache.org/jira/browse/KYLIN-2088 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Reporter: Yerui Sun >Assignee: Yerui Sun > Attachments: KYLIN-2088.patch > > > Retention or Conversion Rates is very important in data analyze. > It can be calculated from two dataset of two different value of one > dimension. For example, we have an count distinct measure, like uv(dataset of > uuid), and one dimension, like date, and the retention of uv between > '20161015' and '20161016' is the intersection of two uv datasets. > Fortunately, we have implement dataset in Kylin, as bitmap, for precisely > count distinct. Only an UDAF is needed to calculate intersection of two or > more bitmaps. > I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)