Hi,

We have a hbase cluster with 8 region servers with 20G memory
We have a table  with 1 column family along with a secondary index.
Following query took only few milliseconds when we had less data(< 1 million)
After adding more data(~30M rows) the performance declined and took about a 
minute or more(not stable)

select msbo1.PARENTID
  from msbo_phoenix_comp_rowkey msbo1
  left outer join (
         select PARENTID,MILESTONETYPEID
           from msbo_phoenix_comp_rowkey
          where PARENTREFERENCETIME between 1479964000 and 1480464000
            and OWNERORGID = 100
            and PARENTTYPE = 'SHIPMENT'
            and MILESTONETYPEID = 19661
            group by PARENTID,MILESTONETYPEID
             ) msbo2
  on msbo1.PARENTID = msbo2.PARENTID
  where msbo1.PARENTTYPE = 'SHIPMENT'
   and msbo1.OWNERORGID = 100
   and msbo2.MILESTONETYPEID is null
   and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
group by msbo1.PARENTID
  order by msbo1.PARENTID

The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K 
rows

MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
  is the index

Query plan:
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100]
    SERVER FILTER BY FIRST KEY ONLY AND 
(TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND 
TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
    SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
CLIENT MERGE SORT
    PARALLEL LEFT-JOIN TABLE 0
        CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000]
            SERVER FILTER BY FIRST KEY ONLY
            SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", 
"MILESTONETYPEID"]
        CLIENT MERGE SORT
    AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL

Attached the phoenix log.
I see the caching to set as 100..and "maxResultSize”:2097152..is that something 
that can be tuned will help?
Is that the client merge sort consuming more time can be improved? Is there any 
other tuning possible?

Thanks,
Pradheep

Attachment: phoenix.log
Description: phoenix.log

Reply via email to