schema optimisation - go for multiple tables, rows or column families?

Tom Mon, 09 Jan 2012 02:02:59 -0800

Hello,

I got most, but not all, answers about schemas from the HBase Book andthe "Definite Guide".Let's say there is a single row key and I use this key to add to twotables, one row each (case (1)).Could someone please confirm that even though the tables are different,based on the key, this data will end up in the same or at least adjacentregions? (I.e. my hbase client has to deal with two HTable instances butonly one region server needs to be looked up)?


Thank you,
Tom

Background:

I have two types of data: meta data (low volume) and measurement data(high volume); and I get requests coming in where, based on an ID, Ineed my HBase client to be able to access both metadata and measurementdata for this ID quickly. I want to reduce communication overhead(lookups, number of tcp connections etc).

In regards to dealing with the two types of data in Hbase, I see thesethree design choices, which one to go for?


(1) Multiple tables - single key - single column family

(2) Single table - single key - multiple column families (the HBase Bookadvises against that in section 6.2).

(3) Single table - multiple keys (all made in such a way that they willbe co-located and system wide hot spots are avoided) - single column family

schema optimisation - go for multiple tables, rows or column families?

Reply via email to