Re: Data taking up too much space when put into HBase

Jeff Whiting Thu, 11 Nov 2010 10:04:50 -0800

Just to clarify, each column family is stored separately from each other. But within a columnfamily each rowkey => key / value is stored independently. I was under the impression that a rowkeywould point to multiple key / value pairs within the column family stores. Am I understandingeverything correctly?

So looking at http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture under "Physical Storage View" itlooks like multiple key / values are stored under one rowkey. However it should show the rowkeyrepeated for each time stamp key / value combination. If that is true then I understand whycompression is so important (lots of redundant data).


~Jeff

On 11/9/2010 10:46 PM, Jean-Daniel Cryans wrote:

Each value is stored with it's full key e.g. row key + family +
qualifier + timestamp + offsets. You don't give any information
regarding how you stored the data, but if you have large enough keys
then it should easily explain the bloat.

J-D

On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar<[email protected]>  wrote:

Hi,

     Data seems to be taking up too much space when I put into HBase. e.g, I
have a 2 GB text file which seems to be taking up ~70 GB when I dump into
HBase. I have block size set to 64 MB and replication=3, which I think is
the possible reason for this expansion. But if that is the case, how can I
prevent it? Decreasing the block size will have a negative impact on
performance, so is there a way I can increase the average size on
HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB on
average. Or is this an entirely different thing at work here?

thanks,
hari


--
Jeff Whiting
Qualtrics Senior Software Engineer
[email protected]

Re: Data taking up too much space when put into HBase

Reply via email to