Just to add a note to the comment of J-D: You want more than one column family ( CF-A and CF-B) only when most (or one set) of your application is reading information stored in CF-A and does not care about information in CF-B. In this case separating less used information in different column family reducing the reading overhead of most common application use case.
-Debashis On Thu, Nov 11, 2010 at 12:04 PM, Jeff Whiting <[email protected]> wrote: > Just to clarify, each column family is stored separately from each other. > But within a column family each rowkey => key / value is stored > independently. I was under the impression that a rowkey would point to > multiple key / value pairs within the column family stores. Am I > understanding everything correctly? > > So looking at http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture under > "Physical Storage View" it looks like multiple key / values are stored under > one rowkey. However it should show the rowkey repeated for each time stamp > key / value combination. If that is true then I understand why compression > is so important (lots of redundant data). > > ~Jeff > > > On 11/9/2010 10:46 PM, Jean-Daniel Cryans wrote: > >> Each value is stored with it's full key e.g. row key + family + >> qualifier + timestamp + offsets. You don't give any information >> regarding how you stored the data, but if you have large enough keys >> then it should easily explain the bloat. >> >> J-D >> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar<[email protected]> >> wrote: >> >>> Hi, >>> >>> Data seems to be taking up too much space when I put into HBase. e.g, >>> I >>> have a 2 GB text file which seems to be taking up ~70 GB when I dump into >>> HBase. I have block size set to 64 MB and replication=3, which I think is >>> the possible reason for this expansion. But if that is the case, how can >>> I >>> prevent it? Decreasing the block size will have a negative impact on >>> performance, so is there a way I can increase the average size on >>> HBase-created files to be comparable to 64 MB. Right now they are ~5 MB >>> on >>> average. Or is this an entirely different thing at work here? >>> >>> thanks, >>> hari >>> >>> > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [email protected] > > -- - DEBASHIS SAHA 2519 Honeysuckle Ln Rolling Meadows, IL 60008, USA 1-(847) 925 - 5071 (H); 1-(312)-731- 6414 (M) --~<O>~--
