答复: 答复: cube关联后的distinct问题

高扬02 Mon, 29 Oct 2018 03:07:16 -0700

Hi Xiaoxiang,
I tried to query pushdown to solve this case , but it is too slow to meet the 
need of us.
I will try something else to solve this problem.
Thank you for your advice.


Best regards,
Yang Gao


发件人: Xiaoxiang Yu [mailto:xiaoxiang...@kyligence.io]
发送时间: 2018年10月29日 17:39
收件人: user@kylin.apache.org
主题: Re: 答复: cube关联后的distinct问题


Hello gaoyang,



In your case,  you have two fact tables which both has one column "user_id", 
these two columns both have duplicate values so you cannot use one of these 
fact tables as a lookup table. You want get the distinct count of user_id which 
occur in both two sides.



If what I understand is correct, I guess you should enable query pushdown to 
let other sql engine to do such task.

----------------
Best wishes,
Xiaoxiang Yu



发件人: 高扬02 <gaoyan...@bianfeng.com<mailto:gaoyan...@bianfeng.com>>
答复: "user@kylin.apache.org<mailto:user@kylin.apache.org>" 
<user@kylin.apache.org<mailto:user@kylin.apache.org>>
日期: 2018年10月29日 星期一 16:50
收件人: "user@kylin.apache.org<mailto:user@kylin.apache.org>" 
<user@kylin.apache.org<mailto:user@kylin.apache.org>>
主题: 答复: cube关联后的distinct问题

Hi Xiaoxiang,
Thank you for your reply.
So you mean , we cannot use Kylin to deal with the intersection of two sets 
from two fact tables , is that true ?
If that is true , I cannot use “join” between two fact tables in two cubes.
And I cannot create a snowflake schema with the tow fact tables , because they 
have multi-multi records after “join” , and the Exception is ” Dup key found” .
Please give me a suggestion , how to get the intersection of two sets from  two 
fact tables.
Thanks a lot.
Best regards.
Yang Gao

发件人: Xiaoxiang Yu [mailto:xiaoxiang...@kyligence.io]
发送时间: 2018年10月29日 15:44
收件人: user@kylin.apache.org<mailto:user@kylin.apache.org>
主题: Re: cube关联后的distinct问题
重要性: 高

Hi, kylin is a OLAP engine, it support star schema and snowflake schema. Both 
star schema and snowflake schema support only one fact table. If these two 
tables cannnot be found in one model, it cannot be executed on kylin.
As FAQ says [http://kylin.apache.org/docs/gettingstarted/faq.html]


Is Kylin a generic SQL engine for big data?

  *   No, Kylin is an OLAP engine with SQL interface. The SQL queries need be 
matched with the pre-defined OLAP model.


----------------
Xiaoxiang Yu


发件人: 高扬02 <gaoyan...@bianfeng.com<mailto:gaoyan...@bianfeng.com>>
答复: "user@kylin.apache.org<mailto:user@kylin.apache.org>" 
<user@kylin.apache.org<mailto:user@kylin.apache.org>>
日期: 2018年10月25日 星期四 15:10
收件人: "user@kylin.apache.org<mailto:user@kylin.apache.org>" 
<user@kylin.apache.org<mailto:user@kylin.apache.org>>
主题: cube关联后的distinct问题

请教各位：

我需要关联两个fact表，所以目前是分别构建两个cube，要关联的字段为user_id，我将它设置同时为dimension和measure，以便可以取得明细（dimension）和计算count_distinct（measure），且不用开启query-push-down
但两个子查询分别命中了对应的cuboid，可是最外层去重查询报错，未知原因？

sql如下：
select count(distinct a.r_id) from
(select r.user_id as r_id,l.user_id as l_id from
(select user_id from register where data_day='2018-09-01' and 
app_id='403_20170811' )r
inner join
(select user_id from login where data_day='2018-09-01' and 
app_id='403_20170811')l
on l.user_id=r.user_id
)a;

报错：
Can't create EnumerableAggregate! while executing SQL: "select count(distinct 
a.r_bmid) from (select r.bmid_user as r_bmid,l.bmid_user as l_bmid from (select 
bmid_user from wl_register_bm_mes where data_day='2018-09-01' and 
app_id='403_20170811' )r inner join (select bmid_user from wl_login_new_mes 
where data_day='2018-10-14' and app_id='403_20170811')l on 
l.bmid_user=r.bmid_user )a LIMIT 50000"

期待解答。thx!

答复: 答复: cube关联后的distinct问题

Reply via email to