Re: Disk Seeks and Column families

Doug Meil Sat, 21 Jan 2012 05:52:46 -0800

Also, for #2 Hbase supports large-scale aggregation through MapReduce.




On 1/21/12 7:47 AM, "Andrey Stepachev" <[email protected]> wrote:

>2012/1/21 Praveen Sripati <[email protected]>:
>> Hi,
>>
>> 1) According to the this url (1), HBase performs well for two or three
>> column families. Why is it so?
>
>Frist, each column family stored in separate location, so, as stated in
>'6.2.1. Cardinality of ColumnFamilies', such schema design can lead
>to many small pieces for small column family and aggregate should
>perform slowly.
>Second, if region split, all column families will split too,
>in case of large  number of them whis can be inefficient.
>Third, related to number of memstores. Each column family
>has it's own memstore, so it is more likely to hit forced flush
>and bloсked writes.
>
>>
>> 2) Dump of a HFile, looks like below. The contents of a row stay
>>together
>> like a regular row-oriented database. If the column family has 100
>>column
>> family qualifiers and is dense then the data for a particular column
>>family
>> qualifier is spread wide. If I want to do an aggregation on a particular
>> column identifier, the disk seeks doesn't seems to be much better than a
>> regular row-oriented database.
>
>You don't need seeks for each column, hbase reads blocks and filter
>out uneeded data.
>And most pefromance gained from collocated keys and compression.
>BTW, hbase is not so good in case of wide tables, hbase prefers tall
>tables.
>
>>
>> Please correct me if I am wrong.
>>
>> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
>> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
>> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
>> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
>> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
>>
>> (1) - http://hbase.apache.org/book/number.of.cfs.html
>>
>> Thanks,
>> Praveen
>
>
>
>-- 
>Andrey.
>

Re: Disk Seeks and Column families

Reply via email to