[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-27 Thread yangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701316#comment-16701316
 ] 

yangwei commented on KYLIN-3696:


好的感谢,这么快就把原因找出。

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-27 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700106#comment-16700106
 ] 

Shaofeng SHI commented on KYLIN-3696:
-

[~yangwei] If a cube has little measure and they are in the same HBase column 
family, Kylin will directly store the bytes to HBase KeyValue, this will avoid 
one deserialize and serialization. The bug exists in TopN's serialization. So I 
think this is why your first cube doesn't have this problem.

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-27 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700052#comment-16700052
 ] 

Shaofeng SHI commented on KYLIN-3696:
-

[~yangwei] I have cherry-pick that hot-fix to 2.5.x branch; You can make a new 
binary package from there and then verify whether it fix the problem in your 
cluster. Please share with us about your findings. Thank you!

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-26 Thread yangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699807#comment-16699807
 ] 

yangwei commented on KYLIN-3696:


看了3693,有一点不一样的是,topN我用spark 构建时如上图cube1所示选择的维度较少时,结果是正确的 如果 
选择的维度比较多时如上图cube2所示需要用mr构建才正确。

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-22 Thread yangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696445#comment-16696445
 ] 

yangwei commented on KYLIN-3696:


mr engine 没有问题已验证

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-20 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693944#comment-16693944
 ] 

Shaofeng SHI commented on KYLIN-3696:
-

[~yangwei] how about the result with MR engine?

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-19 Thread yangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692608#comment-16692608
 ] 

yangwei commented on KYLIN-3696:


好的我用mr engine

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-19 Thread yangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692597#comment-16692597
 ] 

yangwei commented on KYLIN-3696:


!image-2018-11-20-11-27-43-750.png!

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-19 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692600#comment-16692600
 ] 

Shaofeng SHI commented on KYLIN-3696:
-

Thank you wei, please switch to MR engine and try again. Please check 
KYLIN-3693. Before it be fixed, please use MR engine.

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png, image-2018-11-20-11-27-43-750.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-19 Thread yangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692594#comment-16692594
 ] 

yangwei commented on KYLIN-3696:


spark

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3696) TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多

2018-11-19 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692593#comment-16692593
 ] 

Shaofeng SHI commented on KYLIN-3696:
-

Which cube engine are you using, MR or Spark?

> TOPN度量在同一个模型下2个cube同时开启统计值不准与真实值差得较多
> 
>
> Key: KYLIN-3696
> URL: https://issues.apache.org/jira/browse/KYLIN-3696
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Affects Versions: v2.5.1
>Reporter: yangwei
>Priority: Major
> Attachments: image-2018-11-20-10-57-28-546.png, 
> image-2018-11-20-11-01-25-120.png
>
>
> 我使用的是v2.5.1,度量topN使用上出现不准的总量。
> 问题再现:
> 一,二个cube使用同一个模型就是同一张物理事实表。
> 二,二个cube同时包含相同的topN度量
> 三,二个cube状态都是Ready
> 目前我暂时的解决方法是在其中一个cube去掉一个topN度量
> 同一个sql在hive与kylin里查的的结果对不上相差很远,下面给出sql
> SELECT IP ,
>  SUM(ACCESS_COUNT) c
> FROM API_ACCESS
> WHERE TAG_DATE = CAST('2018-11-19' AS DATE)
>  group by ip
> ORDER BY 
>  c DESC
> LIMIT 10;
> 二个cube中的度量:
>  cube1:
> !image-2018-11-20-10-57-28-546.png!
> cube2:
> !image-2018-11-20-11-01-25-120.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)