I added pointer to below into our book as 'intro to secondary indexing in hbase'. St.Ack
On Fri, Mar 25, 2011 at 8:39 AM, Buttler, David <[email protected]> wrote: > Do you know what it means to make secondary indexing a feature? There are > two reasonable outcomes: > 1) adding ACID semantics (and thus killing scalability) > 2) allowing the secondary index to be out of date (leading to every naïve > user claiming that there is a serious bug that must be fixed). > > Secondary indexes are basically another way of storing (part of) the data. > E.g. another table, sorted on the field(s) that you want to search on. In > order to ensure consistency between the primary table and the secondary table > (index), you have to guarantee that when you mutate the primary table that > the secondary table is mutated in the same atomic transaction. Since HBase > only has row-level locks, this can't be guaranteed across tables. > > The situation is not hopeless, because in many cases you don't need to have > perfectly consistent data and can afford to wait for cleanup tasks. For some > applications, you can ensure that the index is updated close enough to the > table update (using external transactions, or something similar) that users > would never notice. One way to implement an eventually consistent secondary > index would be to mimic the way cluster replication is done. > > However, what I have described is difficult to do generically -- and there > are engineering tradeoffs that need to be made. If you absolutely need a > transactional and consistent secondary index, I would suggest using Oracle, > MySQL, or another relational database, where this was designed in as a > primary feature. Just don't complain that they are too slow or don't scale > as well as HBase. > > </rant> > > Sorry for the rant. If you want to have a secondary index here is what you > need to do: > Modify your application so that every time you write to the primary table, > you also write to a secondary table, keyed off of the values you want to > search on. If you can't guarantee that the values form a secondary key (i.e. > are unique across your entire table), you can make your key a compound key > (see, for example, how "tsuna" designed OpenTSDB) with your primary key as a > component. > > Then, when you need to query, you can do range queries over the secondary > table to retrieve the keys in the primary table to return the full data row. > > Dave > > -----Original Message----- > From: Wei Shung Chung [mailto:[email protected]] > Sent: Friday, March 25, 2011 12:04 AM > To: [email protected] > Subject: Re: Stargate+hbase > > I need to use secondary indexing too, hopefully this important feature > will be made available soon :) > > Sent from my iPhone > > On Mar 25, 2011, at 12:48 AM, Stack <[email protected]> wrote: > >> There is no native support for secondary indices in HBase (currently). >> You will have to manage it yourself. >> St.Ack >> >> On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <[email protected] >> > wrote: >>> I have tried secondary indexing. It seems I miss some points. Could >>> you >>> please explain how it is possible using secondary indexing? >>> >>> >>> I have tried like, >>> >>> >>> Columnamilty1:kwd1 >>> Columnamilty1:kwd2 >>> row1 Columnamilty1:kwd3 >>> Columnamilty1:kwd2 >>> >>> Columnamilty1:kwd1 >>> Columnamilty1:kwd2 >>> row2 Columnamilty1:kwd4 >>> Columnamilty1:kwd5 >>> >>> >>> I need to get all rows which contain kwd1 and kwd2 >>> >>> Please help. >>> Thanks >>> >>> >>> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <[email protected] >>> >wrote: >>> >>>> What you are asking for is a secondary index, and it doesn't exist >>>> at >>>> the moment in HBase (let alone REST). Googling a bit for "hbase >>>> secondary indexing" will show you how people usually do it. >>>> >>>> J-D >>>> >>>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <[email protected] >>>> > >>>> wrote: >>>>> Is it possible using stargate interface to hbase, fetch all rows >>>>> where >>>> more >>>>> than one column family:<qualifier> must be present? >>>>> >>>>> like :select rows which contains keyword:a and keyword:b ? >>>>> >>>>> Thanks >>>>> >>>> >>> >>> >>> >>> -- >>> Sreejith PK >>> Nesote Technologies (P) Ltd >>> >
