Ah, so the bloat is not because of the files being 5-6 MB in size? Wouldn't a 6 MB file occupy 64 MB if I set block size as 64 MB?
hari On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <[email protected]>wrote: > Each value is stored with it's full key e.g. row key + family + > qualifier + timestamp + offsets. You don't give any information > regarding how you stored the data, but if you have large enough keys > then it should easily explain the bloat. > > J-D > > On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <[email protected]> > wrote: > > Hi, > > > > Data seems to be taking up too much space when I put into HBase. e.g, > I > > have a 2 GB text file which seems to be taking up ~70 GB when I dump into > > HBase. I have block size set to 64 MB and replication=3, which I think is > > the possible reason for this expansion. But if that is the case, how can > I > > prevent it? Decreasing the block size will have a negative impact on > > performance, so is there a way I can increase the average size on > > HBase-created files to be comparable to 64 MB. Right now they are ~5 MB > on > > average. Or is this an entirely different thing at work here? > > > > thanks, > > hari > > >
