We do our indexes by using an index family in the same table we are indexing and make sure that no index rowkey could possibly be a valid data rowkey. This does not guarantee they will be in the same transaction but it does allow you to batch your puts for both data and index together.
-chris On Feb 20, 2011, at 9:49 PM, Hari Sreekumar wrote: > All right, I understand the integrity cost is there because we don't get > transactions in HBase over multiple tables. Thanks a lot for your time and > help :) > > Hari > > On Mon, Feb 21, 2011 at 11:09 AM, Ted Yu <[email protected]> wrote: > >> One particular region is for only one table. So the answer to your first >> question is no. >> >> For your use case, you need to consider the cost of maintaining 4 index >> tables (in terms of data integrity). >> You should try to minimize the number of index tables. >> >> On Sun, Feb 20, 2011 at 7:50 PM, Hari Sreekumar <[email protected] >>> wrote: >> >>> So if I have 10 tables each with 2 families, I'd open up 20 stores >> whenever >>> I open a region for reading? Is it a problem to have too many tables. >> e.g, >>> if I have 1 big table and 4 indexing tables for the big table? Are there >>> any >>> potential issues with this? >>> >>> Thanks, >>> Hari >>> >>> On Sun, Feb 20, 2011 at 8:47 PM, Ted Yu <[email protected]> wrote: >>> >>>>>> Does this mean that a store instance is opened for all tables >> present >>> in >>>>>> HBase irrespective of which table we are querying and for all >>>>>> columnfamilies? >>>> No. The blog says Store instance is for each family. >>>> >>>> You should generally avoid multiple column families. But we can help >> you >>>> analyze your use case. >>>> If you read through https://issues.apache.org/jira/browse/HBASE-3149, >>> you >>>> would better understand current implementation. >>>> >>>> On Sun, Feb 20, 2011 at 6:38 AM, Hari Sreekumar < >>> [email protected] >>>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I was going through the HBase architecture blog by Lars George ( >>>>> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html) >>>> and >>>>> I just wanted a clarification regarding how HBase reads data. The >> blog >>>>> mentions that : >>>>> >>>>> Next the HRegionServer opens the region it creates a corresponding >>>>> HRegion object. >>>>> When the HRegion is "opened" it sets up a Store instance for each >>>>> HColumnFamily for every table as defined by the user beforehand. Each >>> of >>>>> the Store instances can in turn have one or more StoreFile instances, >>>> which >>>>> are lightweight wrappers around the actual storage file called HFile. >> A >>>>> HRegion also has a MemStore and a HLog instance. We will now have a >>> look >>>> at >>>>> how they work together but also where there are exceptions to the >> rule. >>>>> >>>>> Does this mean that a store instance is opened for all tables present >>> in >>>>> HBase irrespective of which table we are querying and for all >>>>> columnfamilies? Is this why I generally see people avoiding large >>> number >>>> of >>>>> tables/large number of column families. If not, what is the reason >> for >>>>> that? >>>>> Is it true at all that we should avoid too many tables/CFs ? >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>> >>> >>
