Praveen, basically you are correct on all counts. If there are too many columns, HBase will have to issue more disk-seeks to extract only the particular columns you need ... and since the data is laid out horizontally there are fewer common substrings in a single HBase-block and compression quality starts to degrade due to reduced redundancy.
On Sat, Jan 21, 2012 at 9:49 AM, Praveen Sripati <[email protected]>wrote: > Thanks for the response. > > > The contents of a row stay together like a regular row-oriented database. > > > K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 > > K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 > > K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 > > K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51 > > K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52 > > Is the above statement true for a HFile? > > Also from the above example, the data for the column family qualifier are > not adjacent to take advantage of compression ( > http://en.wikipedia.org/wiki/Column-oriented_DBMS#Compression). Is this a > proper statement? > > Regards, > Praveen > > On Sat, Jan 21, 2012 at 9:03 PM, <[email protected]> wrote: > > > Have you considered using AggregationProtocol to perform aggregation ? > > > > Thanks > > > > > > > > On Jan 20, 2012, at 11:08 PM, Praveen Sripati <[email protected]> > > wrote: > > > > > Hi, > > > > > > 1) According to the this url (1), HBase performs well for two or three > > > column families. Why is it so? > > > > > > 2) Dump of a HFile, looks like below. The contents of a row stay > together > > > like a regular row-oriented database. If the column family has 100 > column > > > family qualifiers and is dense then the data for a particular column > > family > > > qualifier is spread wide. If I want to do an aggregation on a > particular > > > column identifier, the disk seeks doesn't seems to be much better than > a > > > regular row-oriented database. > > > > > > Please correct me if I am wrong. > > > > > > K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 > > > K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 > > > K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 > > > K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51 > > > K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52 > > > > > > (1) - http://hbase.apache.org/book/number.of.cfs.html > > > > > > Thanks, > > > Praveen > > >
