Hi All, Sorry for late reply as i got stuck in other task at work on Friday and skimming through the HBase-4676 took me a while.
HBase-6093 seems to be very close to my suggestion. The only difference is that Matt mentioned in the description that it can only be used when all inserts are type=Put. Is aforementioned restriction due to HFileV2? I think deleting an entire row wouldn't be a problem. right? I have very little knowledge about HFileV2. I will try to read about HFileV2 soon. HBASE-4676 seems really cool. IMHO, currently the issue is that write and scan(slower by ~2x as compared to NONE if we assume that Trie compresses by ~2-3x) are slow and as per the jira if ratio of value/Key is big then trie wont have any impact. Is this feature going to be part of any future release of HBase? Awesome stuff Matt. @Anoop: You meant that i should use the feature in HBase-4676 and pass the timestamp as 0L in each put. Right? Thanks all for your valuable time and inputs. -Anil On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <[email protected]> wrote: > Hi Anil, > > I created HBASE-6093 > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that > could solve this problem. It could be a simple > implementation for simple workloads, but gets harder to support for tables > with TTL's, maxVersion > 1, Deletes, etc... Maybe it can only be enabled > if the other ColumnFamily settings are compatible. > > Matt > > > On Thu, May 24, 2012 at 9:37 PM, Ted Yu <[email protected]> wrote: > > > What Anoop said is in 0.94.0 > > > > For trunk, HBASE-4676 provides trie data block encoding. > > It suits write-once read-many use case very well. > > > > Cheers > > > > On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <[email protected]> > > wrote: > > > > > Hi Anil, > > > There is no way you can avoid the timestamp with KVs. In your > > > case you can think of using data block encoding? You can see > > > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of > > avoiding > > > writing the 8 bytes into each KV for timestamp. Still some bytes will > be > > > written though and this will be done at the block level. Also pls note > > that > > > these encoders will do much more things than the timestamp space > > > optimization. Also you need to make sure to pass some timestamp in your > > > Puts. May be better make as 0L. Else in RS side HBase will assign the > cur > > > time as the timestamp. Hope when u read the javadoc for these encoder > > > classes, u will be more clear. > > > > > > The one you are telling abt having a feature to fully avoid the > timestamp > > > is a topic to discuss > > > > > > Hope I make it clear to you > > > > > > -Anoop- > > > ________________________________________ > > > From: anil gupta [[email protected]] > > > Sent: Friday, May 25, 2012 3:21 AM > > > To: [email protected] > > > Subject: Disable timestamp in HBase Table a.k.a Disable Versioning in > > > HBase Table > > > > > > Hi All, > > > > > > We are planning to store data in HBase. Currently, in one of our use > case > > > once a row is written into HBase Table we wont be modifying the data of > > > that row. Since, for every cell(right?) in HBase a timestamp(long > value) > > is > > > stored; this would take up extra 8 bytes. I was thinking is there a way > > to > > > disable timestamp on HBase table when versioning is not required. I > went > > > through the documentation and searched mailing list for same but could > > not > > > find anything relevant. Since we are talking about billions of cells, > > this > > > would add up to significant amount of space.(around 7.45 GigaBytes for > 1 > > > billion cells). Does this sounds like a feature HBase is missing? > > > > > > Please share your thoughts. > > > > > > -- > > > Thanks & Regards, > > > Anil Gupta > > > > > > -- Thanks & Regards, Anil Gupta
