Hi,
I would like to setup an Hbase table that would provide users the ability
to perform selects only (get and scans). We don't have a need for users to
perform inserts or updates at the moment. But yes i will have to
load/insert the data into the tables before users can perform selects.
I can have the row key as a composite, having "brand:date:users" where
brand is a 4 letter code for all brands, date is DD-MM-YYYY and users is
the metric (how many people bought a certain brand). This will give me
rather tall table which will have millions of rows and less columns (maybe
2) at most.
or
Would it be better to have a wider table with the row key as users:date
only and have the brands become a column family. There are many brands to
track on a daily basis. People using my table will need to select a
particular brand, a group or all brands to retrieve and display data.
If i recollect is it recommended to have tall tables if one is not doing
atomic operations? Does a get/scan in Hbase perform any row locking?
Having a tall table means more data can be spread out over regions on
different nodes in my cluster. I have a small test cluster of 3 nodes at
the moment.
I intend to have other metrics (quantity, price etc) and types (brand,
products, campaigns etc). So my table will be gorw fast and have lots of
data.
If i use the type (brand, campaign, product) as part of the row key then
my inserts will be in the millions over time but if i make the type a
column family then i will end up with wider entries and less rows.
Thanks,
Usman
--
Using Opera's revolutionary email client: http://www.opera.com/mail/