I wonder if the nature of the proposal should be to hide the complexity of raw 
hbase options and provide a better semantic specifications for the 
HBASE_OPTIONS clause.


>From 
>http://archive.cloudera.com/cdh5/cdh/5/hbase-0.98.6-cdh5.2.4/book/compression.html,
> there is a good set of rules for the choice of compression.


  *   If you have long keys (compared to the values) or many columns, use a 
prefix encoder. FAST_DIFF is recommended, as more testing is needed for Prefix 
Tree encoding.
  *
  *   If the values are large (and not precompressed, such as images), use a 
data block compressor.
  *
  *   Use GZIP for cold data, which is accessed infrequently. GZIP compression 
uses more CPU resources than Snappy or LZO, but provides a higher compression 
ratio.
  *
  *   Use Snappy or LZO for hot data, which is accessed frequently. Snappy and 
LZO use fewer CPU resources than GZIP, but do not provide as high of a 
compression ratio.
  *
  *   In most cases, enabling Snappy or LZO by default is a good choice, 
because they have a low performance overhead and provide space savings.
  *
  *   Before Snappy became available by Google in 2011, LZO was the default. 
Snappy has similar qualities as LZO but has been shown to perform better.


Instead of saying

   DATA_BLOCK_ENCODING = 'FAST_DIFF',


we can say the following and provide a good compression scheme for the type of 
data behind the scene.

  OPTIMIZED for <type> data
  <type> := cold | hot | image  (default to hot data, if the OPTIMIZED clause 
is omitted)


On encoding.  it looks like that FAST_DIFF is recommended and therefore we can 
just make it a default value.

On mem store flush size.  The default value should be the one based on the 
nature of the table (dimension, fact and its type (tpcds, or oltp), which 
describes the access attribute of the data, and can take a value of 
load_and_read or read_write. Call the attribute <access>.

So at the end, a HBASE_OPTIONS with better semantic specification could be 
written as follows.

HBASE_OPTIONS
  (
    optimized for <type> , <access> data
  )


Examples:

HBASE_OPTIONS
  (
    optimized for hot, read_write data
  )

HBASE_OPTIONS
  (
    optimized for cold, load_and_read data
  )

And one can of course use the CQD method to provide a default HBASE_OPTIONS 
clause.

Just my 2 cents.

Thanks --Qifan
________________________________
From: Rohit Jain <rohit.j...@esgyn.com>
Sent: Wednesday, June 7, 2017 11:51:47 AM
To: user@trafodion.incubator.apache.org
Subject: RE: Make "HBase options" as default setting?

I am suggesting that we change the default settings to those as proposed so 
that all tables use those settings by default.  So, it is a +1.  A CQD might be 
used to override the default.  This, hopefully will be rare.  I just think that 
having entries in a system DEFAULTS table are problematic for the reasons I 
mentioned. Of course, they would be in the compiler’s default settings.

Rohit

From: Dave Birdsall [mailto:dave.birds...@esgyn.com]
Sent: Wednesday, June 7, 2017 11:28 AM
To: user@trafodion.incubator.apache.org
Subject: RE: Make "HBase options" as default setting?

So, would this make you a -1 on the original suggestion? (that is, just 
changing the HBASE_OPTIONS defaults?)

From: Rohit Jain [mailto:rohit.j...@esgyn.com]
Sent: Wednesday, June 7, 2017 9:16 AM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: RE: Make "HBase options" as default setting?

Dave,

CQDs indicate that we might make a different decision on these for certain 
customers.  If that is not the case, then we are just increasing the complexity 
of configuring a system with the right default defaults, etc. and opening 
ourselves us for the possibility that something may go wrong.  Often times, 
folks are not even aware as to what is set in the DEFAULTS table and are 
puzzled by the behavior, sometimes because someone added something to that 
table or changed the setting and forgot to set it back, etc.  Then you have to 
document these CQDs, when to change them, etc.  So, while the flexibility of 
having sooooo many buttons may be a good thing to cover all possible 
theoretical combinations one might come up with, from a practical standpoint 
the KISS (Keep It Simple Stupid) principal trumps all.

Rohit

From: Dave Birdsall [mailto:dave.birds...@esgyn.com]
Sent: Wednesday, June 7, 2017 10:44 AM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: RE: Make "HBase options" as default setting?

What might make sense is to add a CQD (or maybe a set of them) with default 
values for certain HBASE_OPTIONS settings. Then one could put these CQDs in the 
system DEFAULTS table. So, you’d set it once in a cluster installation and then 
not have to worry about it afterwards.

From: Dave Birdsall
Sent: Wednesday, June 7, 2017 8:26 AM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: RE: Make "HBase options" as default setting?

-1  Snappy is often not installed on workstations, so I would not want to make 
that the default.

From: Eric Owhadi [mailto:eric.owh...@esgyn.com]
Sent: Wednesday, June 7, 2017 4:52 AM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: Re: Make "HBase options" as default setting?

+1



Sent from my Samsung Galaxy smartphone.


-------- Original message --------
From: "Liu, Yuan (Yuan)" <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>
Date: 6/6/17 10:29 PM (GMT-06:00)
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: Make "HBase options" as default setting?

Hi Trafodioners,

As we know, for performance thinking, we always need to add below syntax in 
“create table” syntax. And I know in latest version, we have made “ATTRIBUTES 
ALIGNED FORMAT” as default, do we have a plan that make HBASE_OPTIONS(including 
ENCODING、COMPRESSION、MEMESTORE) as the default setting when creating table? I 
think this will be easier for new users?

ATTRIBUTES ALIGNED FORMAT
  HBASE_OPTIONS
  (
    DATA_BLOCK_ENCODING = 'FAST_DIFF',
    COMPRESSION = 'SNAPPY',
    MEMSTORE_FLUSH_SIZE = '1073741824'
  )
;



Best regards,
Yuan
Email: yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>
Cellphone: (+86) 13671935540

Reply via email to