Hi ,
How can I optimize my kylin query ?
Cluster env. : Our cluster has 5 nodes for 1 master and 4 slaves, the
hadoop cluster and hbase cluster is on the same node. The memory is 32G /
node.
Cube Design :
1. tables : A fact table (200,000,000 records in hive) left join a
account dim table (200w records in hive) and left join a branch dim table.
2. dimensions& measures : There are 4 dimensions include 2 columns
of fact table and 1 column of account dim and the other column of branch
dim. and just add 1 measure —— sum()
3. aggregation group : agg1. include column idate, and set this to
mandatory, agg2 include other 3 dimensions.
Query SQL like this:
select a.acct_no,sum(amt) from f left join acct_dim a on
f.acct_no=a.acct_no [ where idate>=... and idate<...] group by a.acct_no .
we may change acct_no to bran_code ,or add the idate optionally, when
group by acct_no ,it costs about 5s.
if we add select columns and join more table like 'left join
acct_dim ... left join branch_dim....' it may take more seconds.
and when we query some detail data like some column in fact table
, it'll query time out.
Our query is every flexible, can we optimize something to query this
group by sql in 1s and can query the detail data correctly?
Thanks .