One other observation - even scanning 1MM rowkeys (using keyonlyfilter and firstkeyonlyfilter) takes 4x the time on 2nd table. No column family is queried at all in this test..
On Aug 21, 2017 10:47 PM, "Partha" <[email protected]> wrote: > hbase(main):001:0> describe 'TABLE1' > Table TABLE1 is ENABLED > TABLE1 > COLUMN FAMILIES DESCRIPTION > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > 1 row(s) in 0.2410 seconds > > hbase(main):002:0> describe 'TABLE2' > Table TABLE2 is ENABLED > TABLE2 > COLUMN FAMILIES DESCRIPTION > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC > KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > > Here are the table definitions.. > > On Mon, Aug 21, 2017 at 10:06 PM, Partha <[email protected]> wrote: > >> final Scan scan = new Scan(startInclusive, endExclusive) >> .addFamily(stage.getBytes()) >> .setCaching(DEFAULT_BATCH_SIZE) >> .setCacheBlocks(false); >> >> Here is the scan test code. This will return ~1MM rows from both tables, >> while limiting scan to a single column family.. >> >> Thanks. >> >> On Mon, Aug 21, 2017 at 2:16 PM, Partha <[email protected]> wrote: >> >>> addFamily only. There is only 1 column/qualifier per column family >>> >>> >>> On Aug 21, 2017 2:05 PM, "Anoop John" <[email protected]> wrote: >>> >>> In ur test are u using Scan#addColumn(byte [] family, byte [] >>> qualifier) or it is addFamily(byte [] family) only? >>> >>> On Mon, Aug 21, 2017 at 10:02 PM, Partha <[email protected]> wrote: >>> > Block cache is disabled on both scan tests. Setcaching is set to 500 >>> in both >>> > scans. Hbase version is 1.1.2.2.6.0.3-8 >>> > >>> > Will post client scan test code. >>> > >>> > Thanks >>> > >>> > >>> > On Aug 21, 2017 8:57 AM, "Anoop John" <[email protected]> wrote: >>> > >>> > I was abt to ask to whether have run the tests after a major >>> > compaction. But there also u are facing same issue it seems ! >>> > >>> > Which version of HBase? >>> > >>> > Block cache been used? What are the size and configs related to cache? >>> > >>> > Can u pls paste the exact client side code been used in tests? >>> > >>> > -Anoop- >>> > >>> > On Sun, Aug 20, 2017 at 4:36 AM, Partha <[email protected]> >>> wrote: >>> >> Anoop, >>> >> >>> >> Yes, each column family (in both tables) uses the same encoding >>> >> (fast-diff) >>> >> and same compression (gzip). >>> >> >>> >> I suggest you to just try the simple test as my case and see if you >>> notice >>> >> a >>> >> similar drop in performance (almost linear to the # of column >>> families) >>> > >>> > >>> >>> >>> >> >
