Partha Sarathy created HBASE-18586:
--------------------------------------

             Summary: Multiple column families - scan performance
                 Key: HBASE-18586
                 URL: https://issues.apache.org/jira/browse/HBASE-18586
             Project: HBase
          Issue Type: Bug
          Components: scan
            Reporter: Partha Sarathy


I have 2 HBase tables - one with a single column family, and other has 4 column 
families. Both tables are keyed by same rowkey, and the column families all 
have a single column qualifier each, with a json string as value (each json 
payload is about 10-20K in size). All column families use fast-diff encoding 
and gzip compression.

After loading about 60MM rows to each table, a scan test on (any) single column 
family in the 2nd table takes 4x the time to scan the single column family from 
the 1st table. In both cases, the scanner is bounded by a start and stop key to 
scan 1MM rows. Performance did not change much even after running a major 
compaction on both tables.

Though HBase doc and other tech forums recommend not using more than 1 column 
family per table, nothing I have read so far suggests scan performance will 
linearly degrade based on number of column families. Has anyone else 
experienced this, and is there a simple explanation for this?

To note, the reason second table has 4 column families is even though I only 
scan one column family at a time now, there are requirements to scan multiple 
column families from that table given a set of rowkeys.

Thanks for any insight into the performance question.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to