2012/1/21 Praveen Sripati <[email protected]>: > Hi, > > 1) According to the this url (1), HBase performs well for two or three > column families. Why is it so?
Frist, each column family stored in separate location, so, as stated in '6.2.1. Cardinality of ColumnFamilies', such schema design can lead to many small pieces for small column family and aggregate should perform slowly. Second, if region split, all column families will split too, in case of large number of them whis can be inefficient. Third, related to number of memstores. Each column family has it's own memstore, so it is more likely to hit forced flush and bloŃked writes. > > 2) Dump of a HFile, looks like below. The contents of a row stay together > like a regular row-oriented database. If the column family has 100 column > family qualifiers and is dense then the data for a particular column family > qualifier is spread wide. If I want to do an aggregation on a particular > column identifier, the disk seeks doesn't seems to be much better than a > regular row-oriented database. You don't need seeks for each column, hbase reads blocks and filter out uneeded data. And most pefromance gained from collocated keys and compression. BTW, hbase is not so good in case of wide tables, hbase prefers tall tables. > > Please correct me if I am wrong. > > K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 > K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 > K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 > K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51 > K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52 > > (1) - http://hbase.apache.org/book/number.of.cfs.html > > Thanks, > Praveen -- Andrey.
