100 writes/updates per min is very low number and HBase, of course, is able to 
sustain 1.5 update/sec (if not GBs per update)
1000 concurrent users and minimum query latency - probably possible but we do 
not have enough info:
 What is SLA? requests per sec and latency requirements? How large is the 
typical result set?

You will definitely need to keep your hot data set in a RAM. If you can afford 
to store data twice and ACID transaction
is not your MUST HAVE feature:

Have two rows per your asset item:
rowkey1: asset_key + update_time
rowkey2: update_time + asset_key

This basically, gives you 2 covered indexes: by asset_key and by update_time, 
but because you duplicate data
you replaces many random look ups (as in case of a simple index) by one scan 
operation on a corresponding
rowkeys.

On asset update insert two rows into table (you can keep them in the same 
table) and make sure you have enough RAM
(cache) to keep all in memory.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]

________________________________________
From: Steven Wu [[email protected]]
Sent: Tuesday, December 10, 2013 3:35 PM
To: [email protected]
Subject: hbase schema design

Hi

   I am very new to Hbase, still self-learning and do POC for our current
project.  I have a question about the row key design.

I have created  big table (called asset table), it  has more than 50M
records. Each asset has unique key (let's call it asset_key)

This table receives continuous updates from up-stream system (around 100
updates per min). The clients would like to receive real-time updates from
us. At current system, we have two indexed columns (asset_key, update_ts) on
asset DB table So the clients could query the db table based on update_ts
for lastest updates. However the db now become a bottleneck

So we are wondering how could we achieve the same function in Hbase. I don't
want to use scan filter function on the column as it will tiger full table
scan (correct me if I am wrong on this).



the best thing I could think of is to have timestamp built in to rowkey.
However, we still have a requirement, that client would like query data
based on unique asset_key



The usercase we have is the system has to support concurrently more than
1000 uses to query latest update from this table at lowest possible latency.
Also ,  clients would like query data based on unique asset_key  to retrieve
records from our system





Really appreciate your though on this.







Regards,





Steven








Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or [email protected] and delete or destroy any 
copy of this message and its attachments.

Reply via email to