21 января 2012 г. 19:16 пользователь Doug Meil <[email protected]> написал: > > One other "big picture" comment: Hbase scales by having lots of servers, > and servers with multiple drives. While single-read performance is > obviously important, there is more to Hbase than a single-server RDBMS > drag-race comparison. It's a distributed architecture (as with MapReduce). > > re: "hbase is not so good in case of wide tables, hbase prefers tall > tables" > > Per... http://hbase.apache.org/book.html#schema.smackdown this is > absolutely true in the extreme cases as described in the book, but I > wouldn't consider hundreds or thousands of attributes to be in that > category as the definition of "wide" tends to be subjective.
This statement mostly related to schemas, where column name is a subkey. For example: timeseries for some attribute. Such situation not scales well, and not handled well by hbase. (due of splits, which are performed on rows boundary). > > > > > On 1/21/12 8:52 AM, "Doug Meil" <[email protected]> wrote: > >> >>Also, for #2 Hbase supports large-scale aggregation through MapReduce. >> >> >> >> >>On 1/21/12 7:47 AM, "Andrey Stepachev" <[email protected]> wrote: >> >>>2012/1/21 Praveen Sripati <[email protected]>: >>>> Hi, >>>> >>>> 1) According to the this url (1), HBase performs well for two or three >>>> column families. Why is it so? >>> >>>Frist, each column family stored in separate location, so, as stated in >>>'6.2.1. Cardinality of ColumnFamilies', such schema design can lead >>>to many small pieces for small column family and aggregate should >>>perform slowly. >>>Second, if region split, all column families will split too, >>>in case of large number of them whis can be inefficient. >>>Third, related to number of memstores. Each column family >>>has it's own memstore, so it is more likely to hit forced flush >>>and bloсked writes. >>> >>>> >>>> 2) Dump of a HFile, looks like below. The contents of a row stay >>>>together >>>> like a regular row-oriented database. If the column family has 100 >>>>column >>>> family qualifiers and is dense then the data for a particular column >>>>family >>>> qualifier is spread wide. If I want to do an aggregation on a >>>>particular >>>> column identifier, the disk seeks doesn't seems to be much better than >>>>a >>>> regular row-oriented database. >>> >>>You don't need seeks for each column, hbase reads blocks and filter >>>out uneeded data. >>>And most pefromance gained from collocated keys and compression. >>>BTW, hbase is not so good in case of wide tables, hbase prefers tall >>>tables. >>> >>>> >>>> Please correct me if I am wrong. >>>> >>>> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 >>>> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 >>>> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 >>>> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51 >>>> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52 >>>> >>>> (1) - http://hbase.apache.org/book/number.of.cfs.html >>>> >>>> Thanks, >>>> Praveen >>> >>> >>> >>>-- >>>Andrey. >>> >> >> >> > > -- Andrey.
