In HBase even if you say keyOnlyFilter there is a column family involved.
In this case if the scan does not specify addfamily() then I think all the
column families will be loaded.

Regards
Ram

On Tue, Aug 22, 2017 at 6:47 PM, Partha <[email protected]> wrote:

> One other observation - even scanning 1MM rowkeys (using keyonlyfilter and
> firstkeyonlyfilter) takes 4x the time on 2nd table. No column family is
> queried at all in this test..
>
> On Aug 21, 2017 10:47 PM, "Partha" <[email protected]> wrote:
>
> > hbase(main):001:0> describe 'TABLE1'
> > Table TABLE1 is ENABLED
> > TABLE1
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > 1 row(s) in 0.2410 seconds
> >
> > hbase(main):002:0> describe 'TABLE2'
> > Table TABLE2 is ENABLED
> > TABLE2
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
> > KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> >
> > Here are the table definitions..
> >
> > On Mon, Aug 21, 2017 at 10:06 PM, Partha <[email protected]> wrote:
> >
> >>       final Scan scan = new Scan(startInclusive, endExclusive)
> >>             .addFamily(stage.getBytes())
> >>             .setCaching(DEFAULT_BATCH_SIZE)
> >>             .setCacheBlocks(false);
> >>
> >> Here is the scan test code. This will return ~1MM rows from both tables,
> >> while limiting scan to a single column family..
> >>
> >> Thanks.
> >>
> >> On Mon, Aug 21, 2017 at 2:16 PM, Partha <[email protected]> wrote:
> >>
> >>> addFamily only. There is only 1 column/qualifier per column family
> >>>
> >>>
> >>> On Aug 21, 2017 2:05 PM, "Anoop John" <[email protected]> wrote:
> >>>
> >>> In ur test are u using Scan#addColumn(byte [] family, byte []
> >>> qualifier)  or it is addFamily(byte [] family) only?
> >>>
> >>> On Mon, Aug 21, 2017 at 10:02 PM, Partha <[email protected]>
> wrote:
> >>> > Block cache is disabled on both scan tests. Setcaching is set to 500
> >>> in both
> >>> > scans. Hbase version is 1.1.2.2.6.0.3-8
> >>> >
> >>> > Will post client scan test code.
> >>> >
> >>> > Thanks
> >>> >
> >>> >
> >>> > On Aug 21, 2017 8:57 AM, "Anoop John" <[email protected]> wrote:
> >>> >
> >>> > I was abt to ask to whether have run the tests after a major
> >>> > compaction.  But there also u are facing same issue it seems !
> >>> >
> >>> > Which version of HBase?
> >>> >
> >>> > Block cache been used?  What are the size and configs related to
> cache?
> >>> >
> >>> > Can u pls paste the exact client side code been used in tests?
> >>> >
> >>> > -Anoop-
> >>> >
> >>> > On Sun, Aug 20, 2017 at 4:36 AM, Partha <[email protected]>
> >>> wrote:
> >>> >> Anoop,
> >>> >>
> >>> >> Yes, each column family (in both tables) uses the same encoding
> >>> >> (fast-diff)
> >>> >> and same compression (gzip).
> >>> >>
> >>> >> I suggest you to just try the simple test as my case and see if you
> >>> notice
> >>> >> a
> >>> >> similar drop in performance (almost linear to the # of column
> >>> families)
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>
> >
>

Reply via email to