Tom, I would want to add to what Jonathan suggested. The approach (1) of having multiple problems: a> As Jonathan suggested, regions are created on a per table basis, so data from different tables will fall in different regions. There is no guarantee on what servers are these regions allocated. b> The greater problem that I perceive with the approach 1 is that small metadata table may not be split well into regions (as the splitting is size based) and hence can become a hot-spot, as a lot of keys will fall in one region.
There is more. If you store the two data in different column-families, they will in-turn be stored in different store-files. So when you fetch the two of them, you will indeed be fetching data from two different store-files, and possibly from two different physical nodes. So, I would ask you: Can you store both meta and measurement data as two different columns in the same column-family ? In that case one fetch on the key for both the data-points will resolve to same region, same store file. just a thought ~Kisalay On Mon, Jan 9, 2012 at 5:21 PM, Jonathan Hsieh <[email protected]> wrote: > Hi Tom, > > In the case you describe -- two HTables -- there is no guarantee that they > will end up going to the same region server. If you have multiple tables, > these are different regions and which can (and most likely will) be > distributed to different regionserver machines. The fact that both tables > use the same rowkeys doesn't matter. > > If you use (2), the single table with column family approach, they would be > located in the same region and thus the same regionserver. > > Given your concerns, and depending on your read patterns (do you do a lot > of scans of only the meta data?), I'd probably take approach (2) or (3). > > Jon. > > On Mon, Jan 9, 2012 at 2:01 AM, Tom <[email protected]> wrote: > > > Hello, > > > > I got most, but not all, answers about schemas from the HBase Book and > the > > "Definite Guide". > > Let's say there is a single row key and I use this key to add to two > > tables, one row each (case (1)). > > Could someone please confirm that even though the tables are different, > > based on the key, this data will end up in the same or at least adjacent > > regions? (I.e. my hbase client has to deal with two HTable instances but > > only one region server needs to be looked up)? > > > > Thank you, > > Tom > > > > Background: > > I have two types of data: meta data (low volume) and measurement data > > (high volume); and I get requests coming in where, based on an ID, I need > > my HBase client to be able to access both metadata and measurement data > for > > this ID quickly. I want to reduce communication overhead (lookups, number > > of tcp connections etc). > > > > In regards to dealing with the two types of data in Hbase, I see these > > three design choices, which one to go for? > > > > (1) Multiple tables - single key - single column family > > > > (2) Single table - single key - multiple column families (the HBase Book > > advises against that in section 6.2). > > > > (3) Single table - multiple keys (all made in such a way that they will > be > > co-located and system wide hot spots are avoided) - single column family > > > > > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // [email protected] >
