I wonder if the nature of the proposal should be to hide the complexity of raw hbase options and provide a better semantic specifications for the HBASE_OPTIONS clause.
>From >http://archive.cloudera.com/cdh5/cdh/5/hbase-0.98.6-cdh5.2.4/book/compression.html, > there is a good set of rules for the choice of compression. * If you have long keys (compared to the values) or many columns, use a prefix encoder. FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding. * * If the values are large (and not precompressed, such as images), use a data block compressor. * * Use GZIP for cold data, which is accessed infrequently. GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression ratio. * * Use Snappy or LZO for hot data, which is accessed frequently. Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high of a compression ratio. * * In most cases, enabling Snappy or LZO by default is a good choice, because they have a low performance overhead and provide space savings. * * Before Snappy became available by Google in 2011, LZO was the default. Snappy has similar qualities as LZO but has been shown to perform better. Instead of saying DATA_BLOCK_ENCODING = 'FAST_DIFF', we can say the following and provide a good compression scheme for the type of data behind the scene. OPTIMIZED for <type> data <type> := cold | hot | image (default to hot data, if the OPTIMIZED clause is omitted) On encoding. it looks like that FAST_DIFF is recommended and therefore we can just make it a default value. On mem store flush size. The default value should be the one based on the nature of the table (dimension, fact and its type (tpcds, or oltp), which describes the access attribute of the data, and can take a value of load_and_read or read_write. Call the attribute <access>. So at the end, a HBASE_OPTIONS with better semantic specification could be written as follows. HBASE_OPTIONS ( optimized for <type> , <access> data ) Examples: HBASE_OPTIONS ( optimized for hot, read_write data ) HBASE_OPTIONS ( optimized for cold, load_and_read data ) And one can of course use the CQD method to provide a default HBASE_OPTIONS clause. Just my 2 cents. Thanks --Qifan ________________________________ From: Rohit Jain <rohit.j...@esgyn.com> Sent: Wednesday, June 7, 2017 11:51:47 AM To: user@trafodion.incubator.apache.org Subject: RE: Make "HBase options" as default setting? I am suggesting that we change the default settings to those as proposed so that all tables use those settings by default. So, it is a +1. A CQD might be used to override the default. This, hopefully will be rare. I just think that having entries in a system DEFAULTS table are problematic for the reasons I mentioned. Of course, they would be in the compiler’s default settings. Rohit From: Dave Birdsall [mailto:dave.birds...@esgyn.com] Sent: Wednesday, June 7, 2017 11:28 AM To: user@trafodion.incubator.apache.org Subject: RE: Make "HBase options" as default setting? So, would this make you a -1 on the original suggestion? (that is, just changing the HBASE_OPTIONS defaults?) From: Rohit Jain [mailto:rohit.j...@esgyn.com] Sent: Wednesday, June 7, 2017 9:16 AM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: RE: Make "HBase options" as default setting? Dave, CQDs indicate that we might make a different decision on these for certain customers. If that is not the case, then we are just increasing the complexity of configuring a system with the right default defaults, etc. and opening ourselves us for the possibility that something may go wrong. Often times, folks are not even aware as to what is set in the DEFAULTS table and are puzzled by the behavior, sometimes because someone added something to that table or changed the setting and forgot to set it back, etc. Then you have to document these CQDs, when to change them, etc. So, while the flexibility of having sooooo many buttons may be a good thing to cover all possible theoretical combinations one might come up with, from a practical standpoint the KISS (Keep It Simple Stupid) principal trumps all. Rohit From: Dave Birdsall [mailto:dave.birds...@esgyn.com] Sent: Wednesday, June 7, 2017 10:44 AM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: RE: Make "HBase options" as default setting? What might make sense is to add a CQD (or maybe a set of them) with default values for certain HBASE_OPTIONS settings. Then one could put these CQDs in the system DEFAULTS table. So, you’d set it once in a cluster installation and then not have to worry about it afterwards. From: Dave Birdsall Sent: Wednesday, June 7, 2017 8:26 AM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: RE: Make "HBase options" as default setting? -1 Snappy is often not installed on workstations, so I would not want to make that the default. From: Eric Owhadi [mailto:eric.owh...@esgyn.com] Sent: Wednesday, June 7, 2017 4:52 AM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: Re: Make "HBase options" as default setting? +1 Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: "Liu, Yuan (Yuan)" <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>> Date: 6/6/17 10:29 PM (GMT-06:00) To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: Make "HBase options" as default setting? Hi Trafodioners, As we know, for performance thinking, we always need to add below syntax in “create table” syntax. And I know in latest version, we have made “ATTRIBUTES ALIGNED FORMAT” as default, do we have a plan that make HBASE_OPTIONS(including ENCODING、COMPRESSION、MEMESTORE) as the default setting when creating table? I think this will be easier for new users? ATTRIBUTES ALIGNED FORMAT HBASE_OPTIONS ( DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', MEMSTORE_FLUSH_SIZE = '1073741824' ) ; Best regards, Yuan Email: yuan....@esgyn.cn<mailto:yuan....@esgyn.cn> Cellphone: (+86) 13671935540