One other "big picture" comment: Hbase scales by having lots of servers, and servers with multiple drives. While single-read performance is obviously important, there is more to Hbase than a single-server RDBMS drag-race comparison. It's a distributed architecture (as with MapReduce).
re: "hbase is not so good in case of wide tables, hbase prefers tall tables" Per... http://hbase.apache.org/book.html#schema.smackdown this is absolutely true in the extreme cases as described in the book, but I wouldn't consider hundreds or thousands of attributes to be in that category as the definition of "wide" tends to be subjective. On 1/21/12 8:52 AM, "Doug Meil" <[email protected]> wrote: > >Also, for #2 Hbase supports large-scale aggregation through MapReduce. > > > > >On 1/21/12 7:47 AM, "Andrey Stepachev" <[email protected]> wrote: > >>2012/1/21 Praveen Sripati <[email protected]>: >>> Hi, >>> >>> 1) According to the this url (1), HBase performs well for two or three >>> column families. Why is it so? >> >>Frist, each column family stored in separate location, so, as stated in >>'6.2.1. Cardinality of ColumnFamilies', such schema design can lead >>to many small pieces for small column family and aggregate should >>perform slowly. >>Second, if region split, all column families will split too, >>in case of large number of them whis can be inefficient. >>Third, related to number of memstores. Each column family >>has it's own memstore, so it is more likely to hit forced flush >>and bloŃked writes. >> >>> >>> 2) Dump of a HFile, looks like below. The contents of a row stay >>>together >>> like a regular row-oriented database. If the column family has 100 >>>column >>> family qualifiers and is dense then the data for a particular column >>>family >>> qualifier is spread wide. If I want to do an aggregation on a >>>particular >>> column identifier, the disk seeks doesn't seems to be much better than >>>a >>> regular row-oriented database. >> >>You don't need seeks for each column, hbase reads blocks and filter >>out uneeded data. >>And most pefromance gained from collocated keys and compression. >>BTW, hbase is not so good in case of wide tables, hbase prefers tall >>tables. >> >>> >>> Please correct me if I am wrong. >>> >>> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 >>> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 >>> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 >>> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51 >>> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52 >>> >>> (1) - http://hbase.apache.org/book/number.of.cfs.html >>> >>> Thanks, >>> Praveen >> >> >> >>-- >>Andrey. >> > > >
