Ugh. Redo. I added pointer to David Butler's response above as an intro to secondary indexing issues in hbase. St.Ack
On Fri, Mar 25, 2011 at 10:09 AM, Stack <[email protected]> wrote: > I added pointer to below into our book as 'intro to secondary indexing > in hbase'. > St.Ack > > On Fri, Mar 25, 2011 at 8:39 AM, Buttler, David <[email protected]> wrote: >> Do you know what it means to make secondary indexing a feature? There are >> two reasonable outcomes: >> 1) adding ACID semantics (and thus killing scalability) >> 2) allowing the secondary index to be out of date (leading to every naïve >> user claiming that there is a serious bug that must be fixed). >> >> Secondary indexes are basically another way of storing (part of) the data. >> E.g. another table, sorted on the field(s) that you want to search on. In >> order to ensure consistency between the primary table and the secondary >> table (index), you have to guarantee that when you mutate the primary table >> that the secondary table is mutated in the same atomic transaction. Since >> HBase only has row-level locks, this can't be guaranteed across tables. >> >> The situation is not hopeless, because in many cases you don't need to have >> perfectly consistent data and can afford to wait for cleanup tasks. For >> some applications, you can ensure that the index is updated close enough to >> the table update (using external transactions, or something similar) that >> users would never notice. One way to implement an eventually consistent >> secondary index would be to mimic the way cluster replication is done. >> >> However, what I have described is difficult to do generically -- and there >> are engineering tradeoffs that need to be made. If you absolutely need a >> transactional and consistent secondary index, I would suggest using Oracle, >> MySQL, or another relational database, where this was designed in as a >> primary feature. Just don't complain that they are too slow or don't scale >> as well as HBase. >> >> </rant> >> >> Sorry for the rant. If you want to have a secondary index here is what you >> need to do: >> Modify your application so that every time you write to the primary table, >> you also write to a secondary table, keyed off of the values you want to >> search on. If you can't guarantee that the values form a secondary key >> (i.e. are unique across your entire table), you can make your key a compound >> key (see, for example, how "tsuna" designed OpenTSDB) with your primary key >> as a component. >> >> Then, when you need to query, you can do range queries over the secondary >> table to retrieve the keys in the primary table to return the full data row. >> >> Dave >> >> -----Original Message----- >> From: Wei Shung Chung [mailto:[email protected]] >> Sent: Friday, March 25, 2011 12:04 AM >> To: [email protected] >> Subject: Re: Stargate+hbase >> >> I need to use secondary indexing too, hopefully this important feature >> will be made available soon :) >> >> Sent from my iPhone >> >> On Mar 25, 2011, at 12:48 AM, Stack <[email protected]> wrote: >> >>> There is no native support for secondary indices in HBase (currently). >>> You will have to manage it yourself. >>> St.Ack >>> >>> On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <[email protected] >>> > wrote: >>>> I have tried secondary indexing. It seems I miss some points. Could >>>> you >>>> please explain how it is possible using secondary indexing? >>>> >>>> >>>> I have tried like, >>>> >>>> >>>> Columnamilty1:kwd1 >>>> Columnamilty1:kwd2 >>>> row1 Columnamilty1:kwd3 >>>> Columnamilty1:kwd2 >>>> >>>> Columnamilty1:kwd1 >>>> Columnamilty1:kwd2 >>>> row2 Columnamilty1:kwd4 >>>> Columnamilty1:kwd5 >>>> >>>> >>>> I need to get all rows which contain kwd1 and kwd2 >>>> >>>> Please help. >>>> Thanks >>>> >>>> >>>> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <[email protected] >>>> >wrote: >>>> >>>>> What you are asking for is a secondary index, and it doesn't exist >>>>> at >>>>> the moment in HBase (let alone REST). Googling a bit for "hbase >>>>> secondary indexing" will show you how people usually do it. >>>>> >>>>> J-D >>>>> >>>>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <[email protected] >>>>> > >>>>> wrote: >>>>>> Is it possible using stargate interface to hbase, fetch all rows >>>>>> where >>>>> more >>>>>> than one column family:<qualifier> must be present? >>>>>> >>>>>> like :select rows which contains keyword:a and keyword:b ? >>>>>> >>>>>> Thanks >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Sreejith PK >>>> Nesote Technologies (P) Ltd >>>> >> >
