Ugh. Redo.  I added pointer to David Butler's response above as an
intro to secondary indexing issues in hbase.
St.Ack

On Fri, Mar 25, 2011 at 10:09 AM, Stack <[email protected]> wrote:
> I added pointer to below into our book as 'intro to secondary indexing
> in hbase'.
> St.Ack
>
> On Fri, Mar 25, 2011 at 8:39 AM, Buttler, David <[email protected]> wrote:
>> Do you know what it means to make secondary indexing a feature?  There are 
>> two reasonable outcomes:
>> 1) adding ACID semantics (and thus killing scalability)
>> 2) allowing the secondary index to be out of date (leading to every naïve 
>> user claiming that there is a serious bug that must be fixed).
>>
>> Secondary indexes are basically another way of storing (part of) the data.  
>> E.g. another table, sorted on the field(s) that you want to search on.  In 
>> order to ensure consistency between the primary table and the secondary 
>> table (index), you have to guarantee that when you mutate the primary table 
>> that the secondary table is mutated in the same atomic transaction.  Since 
>> HBase only has row-level locks, this can't be guaranteed across tables.
>>
>> The situation is not hopeless, because in many cases you don't need to have 
>> perfectly consistent data and can afford to wait for cleanup tasks.  For 
>> some applications, you can ensure that the index is updated close enough to 
>> the table update (using external transactions, or something similar) that 
>> users would never notice.  One way to implement an eventually consistent 
>> secondary index would be to mimic the way cluster replication is done.
>>
>> However, what  I have described is difficult to do generically -- and there 
>> are engineering tradeoffs that need to be made.  If you absolutely need a 
>> transactional and consistent secondary index, I would suggest using Oracle, 
>> MySQL, or another relational database, where this was designed in as a 
>> primary feature.  Just don't complain that they are too slow or don't scale 
>> as well as HBase.
>>
>> </rant>
>>
>> Sorry for the rant.  If you want to have a secondary index here is what you 
>> need to do:
>> Modify your application so that every time you write to the primary table, 
>> you also write to a secondary table, keyed off of the values you want to 
>> search on.  If you can't guarantee that the values form a secondary key 
>> (i.e. are unique across your entire table), you can make your key a compound 
>> key (see, for example, how "tsuna" designed OpenTSDB) with your primary key 
>> as a component.
>>
>> Then, when you need to query, you can do range queries over the secondary 
>> table to retrieve the keys in the primary table to return the full data row.
>>
>> Dave
>>
>> -----Original Message-----
>> From: Wei Shung Chung [mailto:[email protected]]
>> Sent: Friday, March 25, 2011 12:04 AM
>> To: [email protected]
>> Subject: Re: Stargate+hbase
>>
>> I need to use secondary indexing too, hopefully this important feature
>> will be made available soon :)
>>
>> Sent from my iPhone
>>
>> On Mar 25, 2011, at 12:48 AM, Stack <[email protected]> wrote:
>>
>>> There is no native support for secondary indices in HBase (currently).
>>> You will have to manage it yourself.
>>> St.Ack
>>>
>>> On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <[email protected]
>>> > wrote:
>>>> I have tried secondary indexing. It seems I miss some points. Could
>>>> you
>>>> please explain how it is possible using secondary indexing?
>>>>
>>>>
>>>> I have tried like,
>>>>
>>>>
>>>>                Columnamilty1:kwd1
>>>>                Columnamilty1:kwd2
>>>> row1         Columnamilty1:kwd3
>>>>                Columnamilty1:kwd2
>>>>
>>>>                Columnamilty1:kwd1
>>>>                Columnamilty1:kwd2
>>>> row2         Columnamilty1:kwd4
>>>>                Columnamilty1:kwd5
>>>>
>>>>
>>>> I need to get all rows which contain kwd1 and kwd2
>>>>
>>>> Please help.
>>>> Thanks
>>>>
>>>>
>>>> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <[email protected]
>>>> >wrote:
>>>>
>>>>> What you are asking for is a secondary index, and it doesn't exist
>>>>> at
>>>>> the moment in HBase (let alone REST). Googling a bit for "hbase
>>>>> secondary indexing" will show you how people usually do it.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <[email protected]
>>>>> >
>>>>> wrote:
>>>>>> Is it possible using stargate interface to hbase,  fetch all rows
>>>>>> where
>>>>> more
>>>>>> than one column family:<qualifier> must be present?
>>>>>>
>>>>>> like :select  rows which contains keyword:a and keyword:b ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sreejith PK
>>>> Nesote Technologies (P) Ltd
>>>>
>>
>

Reply via email to