Re: Disk Seeks and Column families

Doug Meil Sat, 21 Jan 2012 07:17:31 -0800

One other "big picture" comment:  Hbase scales by having lots of servers,
and servers with multiple drives. While single-read performance is
obviously important, there is more to Hbase than a single-server RDBMS
drag-race comparison.  It's a distributed architecture (as with MapReduce).


re:  "hbase is not so good in case of wide tables, hbase prefers tall
tables"  

Per... http://hbase.apache.org/book.html#schema.smackdown  this is
absolutely true in the extreme cases as described in the book, but I
wouldn't consider hundreds or thousands of attributes to be in that
category as the definition of "wide" tends to be subjective.




On 1/21/12 8:52 AM, "Doug Meil" <[email protected]> wrote:

>
>Also, for #2 Hbase supports large-scale aggregation through MapReduce.
>
>
>
>
>On 1/21/12 7:47 AM, "Andrey Stepachev" <[email protected]> wrote:
>
>>2012/1/21 Praveen Sripati <[email protected]>:
>>> Hi,
>>>
>>> 1) According to the this url (1), HBase performs well for two or three
>>> column families. Why is it so?
>>
>>Frist, each column family stored in separate location, so, as stated in
>>'6.2.1. Cardinality of ColumnFamilies', such schema design can lead
>>to many small pieces for small column family and aggregate should
>>perform slowly.
>>Second, if region split, all column families will split too,
>>in case of large  number of them whis can be inefficient.
>>Third, related to number of memstores. Each column family
>>has it's own memstore, so it is more likely to hit forced flush
>>and bloсked writes.
>>
>>>
>>> 2) Dump of a HFile, looks like below. The contents of a row stay
>>>together
>>> like a regular row-oriented database. If the column family has 100
>>>column
>>> family qualifiers and is dense then the data for a particular column
>>>family
>>> qualifier is spread wide. If I want to do an aggregation on a
>>>particular
>>> column identifier, the disk seeks doesn't seems to be much better than
>>>a
>>> regular row-oriented database.
>>
>>You don't need seeks for each column, hbase reads blocks and filter
>>out uneeded data.
>>And most pefromance gained from collocated keys and compression.
>>BTW, hbase is not so good in case of wide tables, hbase prefers tall
>>tables.
>>
>>>
>>> Please correct me if I am wrong.
>>>
>>> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
>>> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
>>> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
>>> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
>>> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
>>>
>>> (1) - http://hbase.apache.org/book/number.of.cfs.html
>>>
>>> Thanks,
>>> Praveen
>>
>>
>>
>>-- 
>>Andrey.
>>
>
>
>

Re: Disk Seeks and Column families

Reply via email to