Hi Johnson,

This is a good observation. For a single cube's query, Kylin can dispatch
that to HBase for in parallel computing, which usually is sub-seconds. When
there are multiple sub-queries
together, Kylin will execute these sub-queries in sequence. After getting
the result back to the Kylin query node (in memory),  it will execute the
final round computing like joining, filtering, etc.
Depends on the data size, the time it takes varies. That is what you
observed.

A workaround is, you can define a new model/cube with the tables together.
Then no need to use sub-queries.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




Johnson <[email protected]> 于2019年11月15日周五 下午6:14写道:

> 由于前端需要将多个指标合并展示,故查询kylin时,一个sql可能会嵌套多个多个子查询。生产环境发现,子查询严重影响查询速度。大家知道什么优化方法吗?
> 测试如下:
> 1.计算活跃设备
> select count(distinct deviceid) dad from
> KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_DEVICE_ACTIVE
> where par_dt>='2019-06-01' and par_dt<='2019-11-15'
> group by par_dt
> 2.计算活跃用户
> select g.par_dt,count(distinct g.userid) "activeAccount" from
> KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_USER_MULTIDIM_ACTIVE g
> where g.par_dt>='2019-06-01' and g.par_dt<='2019-11-01'
> group by g.par_dt
> 分别执行两个sql耗时基本在0.5s左右。
>
> 3.采用子查询:
> select g.par_dt,count(distinct g.userid) "activeAccount",a.dad
> "activeDevice"
> from KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_USER_MULTIDIM_ACTIVE g
> left join(
> select par_dt,count(distinct deviceid) dad from
> KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_DEVICE_ACTIVE
> where par_dt>='2019-06-01' and par_dt<='2019-11-01'
> group by par_dt
> ) a
> on g.par_dt = a.par_dt
> where g.par_dt>='2019-06-01' and g.par_dt<='2019-11-01'
> group by g.par_dt,a.dad
> 查询耗时多达1.5s,*当子查询增多时,查询耗时高达几十秒*。如下图,对于此结果不是很理解,因为子查询完全可以并行去查,然后子查询的结果(数据量已经超级小了)在kylin
> server端聚合,应该很快啊,为什么会这么慢?
>
>
>
>

Reply via email to