Hi, We have a hbase cluster with 8 region servers with 20G memory We have a table with 1 column family along with a secondary index. Following query took only few milliseconds when we had less data(< 1 million) After adding more data(~30M rows) the performance declined and took about a minute or more(not stable)
select msbo1.PARENTID from msbo_phoenix_comp_rowkey msbo1 left outer join ( select PARENTID,MILESTONETYPEID from msbo_phoenix_comp_rowkey where PARENTREFERENCETIME between 1479964000 and 1480464000 and OWNERORGID = 100 and PARENTTYPE = 'SHIPMENT' and MILESTONETYPEID = 19661 group by PARENTID,MILESTONETYPEID ) msbo2 on msbo1.PARENTID = msbo2.PARENTID where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo2.MILESTONETYPEID is null and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000 group by msbo1.PARENTID order by msbo1.PARENTID The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K rows MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME is the index Query plan: CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100] SERVER FILTER BY FIRST KEY ONLY AND (TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000) SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"] CLIENT MERGE SORT PARALLEL LEFT-JOIN TABLE 0 CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000] SERVER FILTER BY FIRST KEY ONLY SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", "MILESTONETYPEID"] CLIENT MERGE SORT AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL Attached the phoenix log. I see the caching to set as 100..and "maxResultSize”:2097152..is that something that can be tuned will help? Is that the client merge sort consuming more time can be improved? Is there any other tuning possible? Thanks, Pradheep
phoenix.log
Description: phoenix.log