To make better use of block cache, see: HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
which is in 0.94 and above To reduce size of HFiles, please see: http://hbase.apache.org/book.html#compression On Mon, Jan 27, 2014 at 2:40 PM, Nick Xie <[email protected]> wrote: > Tom, > > Yes, you are right. According to this analysis ( > > http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html > ) > if it is right, then the overhead is quite big if the cell value > occupies > a small portion. > > In the analysis in that link, the overhead is actually 10x!!!!(the real > values only takes 12B and it costs 123B in HBase to store them...) Is that > real???? > > In this case, should we do some combination to reduce the overhead? > > Thanks, > > Nick > > > > > On Mon, Jan 27, 2014 at 2:33 PM, Tom Brown <[email protected]> wrote: > > > I believe each cell stores its own copy of the entire row key, column > > qualifier, and timestamp. Could that account for the increase in size? > > > > --Tom > > > > > > On Mon, Jan 27, 2014 at 3:12 PM, Nick Xie <[email protected]> > > wrote: > > > > > I'm importing a set of data into HBase. The CSV file contains 82 > entries > > > per line. Starting with 8 byte ID, followed by 16 byte date and the > rest > > > are 80 numbers with 4 bytes each. > > > > > > The current HBase schema is: ID as row key, date as a 'date' family > with > > > 'value' qualifier, the rest is in another family called 'readings' with > > > 'P0', 'P1', 'P2', ... through 'P79' as qualifiers. > > > > > > I'm testing this on a single node cluster with HBase running in pseudo > > > distributed mode (no replication, no compression for HBase)...After > > > importing a CSV file with 150MB of size in HDFS(no replication), I > > checked > > > the the table size, and it shows ~900MB which is 6x times larger than > it > > is > > > in HDFS.... > > > > > > Why there is so large overhead on this? Am I doing anything wrong here? > > > > > > Thanks, > > > > > > Nick > > > > > >
