Re: Stargate+hbase

Weishung Chung Fri, 25 Mar 2011 10:38:51 -0700

+1 Thank you David for the great explanation. It's complicated.
I am pretty new to this BigData space and found it really interesting and
always want to learn more about it.  I will definitely look into OpenTSDB as
suggested. Thanks again :D


On Fri, Mar 25, 2011 at 12:18 PM, Buttler, David <[email protected]> wrote:

> Hmmm.... maybe my mental model is deficient.  How do you propose building a
> secondary index without a transaction?
>
> The reason indexes work is that they store the data in a different way than
> the primary table.  That implies a second, independent data storage.
>  Without a transaction you can't be guaranteed that the second data
> structure is always updated in sync with the primary table.
>
> I suppose you could roll the multiple data writes into the initial data
> write -- that would work if you have write-once data.  But if you partially
> update the data then you have the issue that you may not have enough
> information in the update to correctly write the key for the secondary data
> stores.  This would mean (in general) that you would have to read an entire
> row before you update any part of it so that you can maintain the secondary
> structures.  Do you see the performance problem here? (or that you are
> introducing a limited transactional / eventually consistent state into the
> data store)
>
> There may be optimizations where you could skip that part of the code if
> there were no indexes.  But now you are talking about greatly increasing the
> complexity of the codebase for a use case that is somewhat specialized.
>  Hence, you see that people who really care about secondary indexes /
> transaction hbase have separate packages.  The probably don't do the job as
> well as is ideally possible by rolling the code into hbase proper, but on
> the other hand, neither do they increase the complexity of the main code
> branch (hence they don't slow down the core development work).
>
> I will stand by my point that there are engineering trade-offs to be made.
>  Take the unix philosophy: small components, loosely coupled. If you need
> indexes, build it on top of HBase, not inside of HBase.  Using things like
> co-processors allows you to extend the capabilities of HBase in a way that
> does not impact the core product and hurt all of the other users. An example
> of this is OpenTSDB.  It is a time-series database that uses hbase under the
> covers, but it doesn't ask that hbase support its needs in some special way.
>  It is very instructive to see how it was constructed.
>
> Dave
>
>
> -----Original Message-----
> From: Weishung Chung [mailto:[email protected]]
> Sent: Friday, March 25, 2011 9:27 AM
> To: [email protected]
> Subject: Re: Stargate+hbase
>
> Thank you so much for the informative info. It really helps me out.
>
> For secondary index, even without transaction, I would think one could
> still
> build a secondary index on another key especially if we have row level
> locking. Correct me if I am wrong.
>
> Also, I have read about clustered B-Tree used in InnoDB to implement
> secondary index but I know that B-Tree is the primary limitation when come
> to scalability and the main reason why NoSQL have discarded B-Tree. But it
> would be super nice to be able to build the secondary index without using
> another secondary table in HBase.
>
> I am not complaining but I would love to see HBase continues to be the top
> NoSQL solution out there :D
> Way to go HBase !
>
> On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <[email protected]>
> wrote:
>
> > Do you know what it means to make secondary indexing a feature?  There
> are
> > two reasonable outcomes:
> > 1) adding ACID semantics (and thus killing scalability)
> > 2) allowing the secondary index to be out of date (leading to every naïve
> > user claiming that there is a serious bug that must be fixed).
> >
> > Secondary indexes are basically another way of storing (part of) the
> data.
> >  E.g. another table, sorted on the field(s) that you want to search on.
>  In
> > order to ensure consistency between the primary table and the secondary
> > table (index), you have to guarantee that when you mutate the primary
> table
> > that the secondary table is mutated in the same atomic transaction.
>  Since
> > HBase only has row-level locks, this can't be guaranteed across tables.
> >
> > The situation is not hopeless, because in many cases you don't need to
> have
> > perfectly consistent data and can afford to wait for cleanup tasks.  For
> > some applications, you can ensure that the index is updated close enough
> to
> > the table update (using external transactions, or something similar) that
> > users would never notice.  One way to implement an eventually consistent
> > secondary index would be to mimic the way cluster replication is done.
> >
> > However, what  I have described is difficult to do generically -- and
> there
> > are engineering tradeoffs that need to be made.  If you absolutely need a
> > transactional and consistent secondary index, I would suggest using
> Oracle,
> > MySQL, or another relational database, where this was designed in as a
> > primary feature.  Just don't complain that they are too slow or don't
> scale
> > as well as HBase.
> >
> > </rant>
> >
> > Sorry for the rant.  If you want to have a secondary index here is what
> you
> > need to do:
> > Modify your application so that every time you write to the primary
> table,
> > you also write to a secondary table, keyed off of the values you want to
> > search on.  If you can't guarantee that the values form a secondary key
> > (i.e. are unique across your entire table), you can make your key a
> compound
> > key (see, for example, how "tsuna" designed OpenTSDB) with your primary
> key
> > as a component.
> >
> > Then, when you need to query, you can do range queries over the secondary
> > table to retrieve the keys in the primary table to return the full data
> row.
> >
> > Dave
> >
> > -----Original Message-----
> > From: Wei Shung Chung [mailto:[email protected]]
> > Sent: Friday, March 25, 2011 12:04 AM
> > To: [email protected]
> > Subject: Re: Stargate+hbase
> >
> > I need to use secondary indexing too, hopefully this important feature
> > will be made available soon :)
> >
> > Sent from my iPhone
> >
> > On Mar 25, 2011, at 12:48 AM, Stack <[email protected]> wrote:
> >
> > > There is no native support for secondary indices in HBase (currently).
> > > You will have to manage it yourself.
> > > St.Ack
> > >
> > > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <
> [email protected]
> > > > wrote:
> > >> I have tried secondary indexing. It seems I miss some points. Could
> > >> you
> > >> please explain how it is possible using secondary indexing?
> > >>
> > >>
> > >> I have tried like,
> > >>
> > >>
> > >>                Columnamilty1:kwd1
> > >>                Columnamilty1:kwd2
> > >> row1         Columnamilty1:kwd3
> > >>                Columnamilty1:kwd2
> > >>
> > >>                Columnamilty1:kwd1
> > >>                Columnamilty1:kwd2
> > >> row2         Columnamilty1:kwd4
> > >>                Columnamilty1:kwd5
> > >>
> > >>
> > >> I need to get all rows which contain kwd1 and kwd2
> > >>
> > >> Please help.
> > >> Thanks
> > >>
> > >>
> > >> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <
> > [email protected]
> > >> >wrote:
> > >>
> > >>> What you are asking for is a secondary index, and it doesn't exist
> > >>> at
> > >>> the moment in HBase (let alone REST). Googling a bit for "hbase
> > >>> secondary indexing" will show you how people usually do it.
> > >>>
> > >>> J-D
> > >>>
> > >>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <
> [email protected]
> > >>> >
> > >>> wrote:
> > >>>> Is it possible using stargate interface to hbase,  fetch all rows
> > >>>> where
> > >>> more
> > >>>> than one column family:<qualifier> must be present?
> > >>>>
> > >>>> like :select  rows which contains keyword:a and keyword:b ?
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Sreejith PK
> > >> Nesote Technologies (P) Ltd
> > >>
> >
>

Re: Stargate+hbase

Reply via email to