how to optimize kylin query

Mars J Fri, 02 Sep 2016 04:18:23 -0700

Hi ,
    How can I optimize my kylin query ?

    Cluster env. : Our cluster has 5 nodes for 1 master and 4 slaves, the
hadoop cluster and hbase cluster is on the same node. The memory is 32G /
node.


    Cube Design :
       1. tables : A fact table (200,000,000 records in hive) left join a
account dim table (200w records in hive) and left join a branch dim table.
       2. dimensions& measures : There are 4 dimensions include 2 columns
of fact table and 1 column of account dim and the other column of branch
dim.  and just add 1 measure —— sum()
       3. aggregation group : agg1. include column idate, and set this to
mandatory, agg2 include other 3 dimensions.

     Query SQL like this:
         select a.acct_no,sum(amt) from f left join acct_dim a on
f.acct_no=a.acct_no [ where idate>=... and idate<...] group by a.acct_no .
  we may change acct_no to bran_code ,or add the idate optionally, when
group by acct_no ,it costs about 5s.
         if we add select columns and join more table like 'left join
acct_dim ... left join branch_dim....' it may take more seconds.
         and when we query some detail data like some column in fact table
, it'll query time out.

       Our query is every flexible, can we optimize something to query this
group by sql in 1s and can query the detail data correctly?

   Thanks .

how to optimize kylin query

Reply via email to