Reading how, exactly? I think (but I am no expert) HBase is very good at sequential table scans but not quite so good at random reads. To help speed things up you can use a pathing technique in secondary index keys. See here:
http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/ So for example you might have Customer -------- CustomerId ... Claim ----- ClaimId Status IxClaim_ClaimIdCustomerId_Asc ----- ClaimIdCustomerId IxClaim_StatusClaimId_Asc ----- StatusClaimId Have the read of the article; it explains it better than I can here. -----Original Message----- From: N Kapshoo [mailto:[email protected]] Sent: Sat 6/19/2010 5:52 AM To: [email protected] Subject: Fwd: data redundancy in hbase tables for read performance I never heard from anyone. I would appreciate if anyone has any insight on this... ---------- Forwarded message ---------- From: N Kapshoo <[email protected]> Date: Wed, May 12, 2010 at 2:21 PM Subject: data redundancy in hbase tables for read performance To: [email protected] For the model I am designing, read speed is the highest priority. That being said, I have a Customers table with information about Claims. Here is the design today: Table: Customers RowId: CustomerId Family: Claims Column: ClaimId Value: JSON(ClaimId, Status, Description, From) I am storing the ClaimsInfo as a JSON object. This JSON object will be displayed in a tabular format after querying. Now I get an additional requirement to sort claims by status. I resolve this by adding a new Family called 'Status'. (Denormalization + Redundancy) Table: Customers RowId: CustomerId Family: ClaimStatus Column: ClaimId Value: *String* My concern is, do I continue down this path when more query requirements are added to the system? For example, when they want to retrieve by 'From', then I add another family called 'From'? Should I be creating a new table in that case to support the new family? Admittedly, the data in these columns is not huge, but I am worried about doing multiple 'Puts' when the value changes. Am I on the right track by adding redundancy to keep up with read performance? Thanks.
