Just to add a note to the comment of J-D:

You want more than one column family ( CF-A and CF-B) only when most (or one
set) of your application is reading information stored in CF-A and does not
care about information in CF-B. In this case separating less used
information in different column family reducing the reading overhead of most
common application use case.

-Debashis

On Thu, Nov 11, 2010 at 12:04 PM, Jeff Whiting <[email protected]> wrote:

> Just to clarify, each column family is stored separately from each other.
>  But within a column family each rowkey => key / value is stored
> independently.  I was under the impression that a rowkey would point to
> multiple key / value pairs within the column family stores.  Am I
> understanding everything correctly?
>
> So looking at http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture under
> "Physical Storage View" it looks like multiple key / values are stored under
> one rowkey.  However it should show the rowkey repeated for each time stamp
> key / value combination.  If that is true then I understand why compression
> is so important (lots of redundant data).
>
> ~Jeff
>
>
> On 11/9/2010 10:46 PM, Jean-Daniel Cryans wrote:
>
>> Each value is stored with it's full key e.g. row key + family +
>> qualifier + timestamp + offsets. You don't give any information
>> regarding how you stored the data, but if you have large enough keys
>> then it should easily explain the bloat.
>>
>> J-D
>>
>> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar<[email protected]>
>>  wrote:
>>
>>> Hi,
>>>
>>>     Data seems to be taking up too much space when I put into HBase. e.g,
>>> I
>>> have a 2 GB text file which seems to be taking up ~70 GB when I dump into
>>> HBase. I have block size set to 64 MB and replication=3, which I think is
>>> the possible reason for this expansion. But if that is the case, how can
>>> I
>>> prevent it? Decreasing the block size will have a negative impact on
>>> performance, so is there a way I can increase the average size on
>>> HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB
>>> on
>>> average. Or is this an entirely different thing at work here?
>>>
>>> thanks,
>>> hari
>>>
>>>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> [email protected]
>
>


-- 
- DEBASHIS SAHA

2519 Honeysuckle Ln
Rolling Meadows, IL 60008, USA

1-(847) 925 - 5071 (H);
1-(312)-731- 6414 (M)
--~<O>~--

Reply via email to