I'm pretty sure that's not how it's reported by the "du" command, but
I wouldn't expect to see files of 5MB on average. Can you be more
specific?

J-D

On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <[email protected]> wrote:
> Ah, so the bloat is not because of the files being 5-6 MB in size? Wouldn't
> a 6 MB file occupy 64 MB if I set block size as 64 MB?
>
> hari
>
> On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> Each value is stored with it's full key e.g. row key + family +
>> qualifier + timestamp + offsets. You don't give any information
>> regarding how you stored the data, but if you have large enough keys
>> then it should easily explain the bloat.
>>
>> J-D
>>
>> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <[email protected]>
>> wrote:
>> > Hi,
>> >
>> >     Data seems to be taking up too much space when I put into HBase. e.g,
>> I
>> > have a 2 GB text file which seems to be taking up ~70 GB when I dump into
>> > HBase. I have block size set to 64 MB and replication=3, which I think is
>> > the possible reason for this expansion. But if that is the case, how can
>> I
>> > prevent it? Decreasing the block size will have a negative impact on
>> > performance, so is there a way I can increase the average size on
>> > HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB
>> on
>> > average. Or is this an entirely different thing at work here?
>> >
>> > thanks,
>> > hari
>> >
>>
>

Reply via email to