Re: Disk Seeks and Column families

Andrey Stepachev Sat, 21 Jan 2012 04:48:51 -0800

2012/1/21 Praveen Sripati <[email protected]>:
> Hi,
>
> 1) According to the this url (1), HBase performs well for two or three
> column families. Why is it so?


Frist, each column family stored in separate location, so, as stated in
'6.2.1. Cardinality of ColumnFamilies', such schema design can lead
to many small pieces for small column family and aggregate should
perform slowly.
Second, if region split, all column families will split too,
in case of large  number of them whis can be inefficient.
Third, related to number of memstores. Each column family
has it's own memstore, so it is more likely to hit forced flush
and bloсked writes.

>
> 2) Dump of a HFile, looks like below. The contents of a row stay together
> like a regular row-oriented database. If the column family has 100 column
> family qualifiers and is dense then the data for a particular column family
> qualifier is spread wide. If I want to do an aggregation on a particular
> column identifier, the disk seeks doesn't seems to be much better than a
> regular row-oriented database.

You don't need seeks for each column, hbase reads blocks and filter
out uneeded data.
And most pefromance gained from collocated keys and compression.
BTW, hbase is not so good in case of wide tables, hbase prefers tall tables.

>
> Please correct me if I am wrong.
>
> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
>
> (1) - http://hbase.apache.org/book/number.of.cfs.html
>
> Thanks,
> Praveen



-- 
Andrey.

Re: Disk Seeks and Column families

Reply via email to