Hello,
I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data (~40GB). 
The build was successful, but i am facing issues with queries. Simple 
aggregation queries are returning results in sub seconds, but queries with 
order by/group by taking too much time. In first place, queries were failing 
with timeout error because of records scan threshold, i then increased 
"kylin.query.scan.threshold" value in kylin.properties. The threshold error got 
fixed, but queries were taking around 200 sec. Which is totally not acceptable 
because HIVE was returning result in 10 seconds for the same query. I am 
attaching one of the query(standard TPC-DS query q3) i am trying to run,
SELECT date_dim.d_year,item.i_brand_id, 
item.i_brand,sum(facttable.ss_ext_discount_amt) sum_agg FROM store_sales 
facttableINNER JOIN date_dim date_dim ON (facttable.ss_sold_date_sk = 
date_dim.d_date_sk)INNER JOIN item item ON (facttable.ss_item_sk = 
item.i_item_sk) WHERE item.i_manufact_id = 783 and date_dim.d_moy = 11 GROUP BY 
date_dim.d_year, item.i_brand,item.i_brand_id ORDER BY date_dim.d_year,sum_agg 
DESC,item.i_brand_idLIMIT 100;
My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with hdp 
2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)

Just to investigate, i checked region server logs of all the nodes and found 
that during query execution only one region server was doing all the work while 
others were idle. And, my Cube's Hbase table was also showing 1 region count, 
So i tried changing following properties but still no luck.
kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
Please let me know, if there is any other configuration needed in order to fix 
large query time.
Thanks 

Reply via email to