Also, for #2 Hbase supports large-scale aggregation through MapReduce.
On 1/21/12 7:47 AM, "Andrey Stepachev" <[email protected]> wrote: >2012/1/21 Praveen Sripati <[email protected]>: >> Hi, >> >> 1) According to the this url (1), HBase performs well for two or three >> column families. Why is it so? > >Frist, each column family stored in separate location, so, as stated in >'6.2.1. Cardinality of ColumnFamilies', such schema design can lead >to many small pieces for small column family and aggregate should >perform slowly. >Second, if region split, all column families will split too, >in case of large number of them whis can be inefficient. >Third, related to number of memstores. Each column family >has it's own memstore, so it is more likely to hit forced flush >and bloŃked writes. > >> >> 2) Dump of a HFile, looks like below. The contents of a row stay >>together >> like a regular row-oriented database. If the column family has 100 >>column >> family qualifiers and is dense then the data for a particular column >>family >> qualifier is spread wide. If I want to do an aggregation on a particular >> column identifier, the disk seeks doesn't seems to be much better than a >> regular row-oriented database. > >You don't need seeks for each column, hbase reads blocks and filter >out uneeded data. >And most pefromance gained from collocated keys and compression. >BTW, hbase is not so good in case of wide tables, hbase prefers tall >tables. > >> >> Please correct me if I am wrong. >> >> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 >> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 >> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 >> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51 >> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52 >> >> (1) - http://hbase.apache.org/book/number.of.cfs.html >> >> Thanks, >> Praveen > > > >-- >Andrey. >
