In HBase even if you say keyOnlyFilter there is a column family involved. In this case if the scan does not specify addfamily() then I think all the column families will be loaded.
Regards Ram On Tue, Aug 22, 2017 at 6:47 PM, Partha <[email protected]> wrote: > One other observation - even scanning 1MM rowkeys (using keyonlyfilter and > firstkeyonlyfilter) takes 4x the time on 2nd table. No column family is > queried at all in this test.. > > On Aug 21, 2017 10:47 PM, "Partha" <[email protected]> wrote: > > > hbase(main):001:0> describe 'TABLE1' > > Table TABLE1 is ENABLED > > TABLE1 > > COLUMN FAMILIES DESCRIPTION > > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > 'FAST_DIFF', > > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO > > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > > 1 row(s) in 0.2410 seconds > > > > hbase(main):002:0> describe 'TABLE2' > > Table TABLE2 is ENABLED > > TABLE2 > > COLUMN FAMILIES DESCRIPTION > > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > 'FAST_DIFF', > > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL > > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > > {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > 'FAST_DIFF', > > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL > > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > > {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > 'FAST_DIFF', > > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC > > KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > > {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > 'FAST_DIFF', > > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO > > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} > > > > Here are the table definitions.. > > > > On Mon, Aug 21, 2017 at 10:06 PM, Partha <[email protected]> wrote: > > > >> final Scan scan = new Scan(startInclusive, endExclusive) > >> .addFamily(stage.getBytes()) > >> .setCaching(DEFAULT_BATCH_SIZE) > >> .setCacheBlocks(false); > >> > >> Here is the scan test code. This will return ~1MM rows from both tables, > >> while limiting scan to a single column family.. > >> > >> Thanks. > >> > >> On Mon, Aug 21, 2017 at 2:16 PM, Partha <[email protected]> wrote: > >> > >>> addFamily only. There is only 1 column/qualifier per column family > >>> > >>> > >>> On Aug 21, 2017 2:05 PM, "Anoop John" <[email protected]> wrote: > >>> > >>> In ur test are u using Scan#addColumn(byte [] family, byte [] > >>> qualifier) or it is addFamily(byte [] family) only? > >>> > >>> On Mon, Aug 21, 2017 at 10:02 PM, Partha <[email protected]> > wrote: > >>> > Block cache is disabled on both scan tests. Setcaching is set to 500 > >>> in both > >>> > scans. Hbase version is 1.1.2.2.6.0.3-8 > >>> > > >>> > Will post client scan test code. > >>> > > >>> > Thanks > >>> > > >>> > > >>> > On Aug 21, 2017 8:57 AM, "Anoop John" <[email protected]> wrote: > >>> > > >>> > I was abt to ask to whether have run the tests after a major > >>> > compaction. But there also u are facing same issue it seems ! > >>> > > >>> > Which version of HBase? > >>> > > >>> > Block cache been used? What are the size and configs related to > cache? > >>> > > >>> > Can u pls paste the exact client side code been used in tests? > >>> > > >>> > -Anoop- > >>> > > >>> > On Sun, Aug 20, 2017 at 4:36 AM, Partha <[email protected]> > >>> wrote: > >>> >> Anoop, > >>> >> > >>> >> Yes, each column family (in both tables) uses the same encoding > >>> >> (fast-diff) > >>> >> and same compression (gzip). > >>> >> > >>> >> I suggest you to just try the simple test as my case and see if you > >>> notice > >>> >> a > >>> >> similar drop in performance (almost linear to the # of column > >>> families) > >>> > > >>> > > >>> > >>> > >>> > >> > > >
