Re: Coprocessor Increments

Ted Yu Mon, 14 Oct 2013 15:35:45 -0700

Anil:
bq. We also use CP's wherever they are appropriate(like HBASE-7474).

HBASE-7474 has been dormant for several months. Do you want to revive it ?


Cheers


On Mon, Oct 14, 2013 at 3:25 PM, anil gupta <[email protected]> wrote:

> Inline.
>
>
> On Mon, Oct 14, 2013 at 7:50 AM, Michael Segel <[email protected]
> >wrote:
>
> > Anil,
> >
> > I wasn't suggesting that you can't do what you're doing, but you end up
> > running in to the risks which coprocessors are supposed to remove. The
> > standard YMMV always applies.
> >
> Agree with you. But, as per my knowledge and experience with coprocessors,
> they are meant to be used for operations that are local to RS. Otherwise,
> you are in danger of running into deadlocks, scalability issues.
>
> >
> > You have a cluster… another team in your company wants to use the
> cluster.
> > So instead of the cluster being a single resource for your app/team, it
> now
> > becomes a shared resource. So now you have people accessing HBase for
> > multiple apps.
> >
> Well, its a separation of responsibility in this case. We don't want teams
> to step each other toes and at the same time work well as an ecosystem.
> Rule: Other teams can use same cluster. But they cannot write directly into
> the tables that we own/control.  If they want to write into our tables then
> they have to use our HBase Client.
>
> >
> > You could then run multiple HBase HMasters with different locations for
> > files, however… this can get messy.
> > HOYA seems to suggest this as the future.  If so, then you have to wonder
> > about data locality.
> >
> HOYA is not even in beta at present. So, right now we are not thinking
> about it.
>
> >
> > Having your app update the primary table and then the secondary index is
> > always a good fallback, however you need to ensure that you understand
> the
> > risks.
> >
> Agree, i understand that there is risk. But, you have to bite the bullet
> when you are doing something that is not supported out of the box.  We also
> use CP's wherever they are appropriate(like HBASE-7474).
>
> >
> > With respect to secondary indexes… if you decouple the writes… you can
> get
> > better throughput. Note that the code becomes a bit more complex because
> > you're going to have to introduce a couple of different things.  But
> thats
> > something for a different discussion…
> >
> Whether to use CP or not, depends on the use case. In my opinion, CP's are
> really powerful and an awesome feature in HBase. But, sometimes if not used
> properly(like creating a Cyclic Graph as per Tom's example), they might be
> problematic.
>
>
> >
> > On Oct 13, 2013, at 10:15 AM, anil gupta <[email protected]> wrote:
> >
> > > Inline.
> > >
> > > On Sun, Oct 13, 2013 at 6:02 AM, Michael Segel <
> > [email protected]>wrote:
> > >
> > >> Ok…
> > >>
> > >> Sure you can have your app update the secondary index table.
> > >> The only issue with that is if someone updates the base table outside
> of
> > >> your app,
> > >> they may or may not increment the secondary index.
> > >>
> > > Anil: We dont allow people to write data into HBase from their own
> HBase
> > > client. We control the writes into HBase. So, we dont have the problem
> of
> > > secondary index not getting written.
> > > For example, If you expose a restful web service you can easily control
> > the
> > > writes to HBase. Even, if user requests to write one row in "main
> table",
> > > you application can have the logic to writing in "Secondary index"
> > tables.
> > > In this way, it is transparent to users also. You can add/remove
> seconday
> > > indexes as you want.
> > >
> > >> Note that your secondary index doesn't have to be an inverted table,
> but
> > >> could be SOLR, LUCENE or something else.
> > >>
> > > Anil:As of now, we are happy with Inverted tables as they fit to our
> use
> > > case.
> > >
> > >>
> > >> So you really want to secondary indexes on the server.
> > >>
> > >> There are a couple of things that could improve the performance,
> > although
> > >> the write to the secondary index would most likely lag under heavy
> load.
> > >>
> > >>
> > >> On Oct 12, 2013, at 11:27 PM, anil gupta <[email protected]>
> wrote:
> > >>
> > >>> John,
> > >>>
> > >>> My 2 cents:
> > >>> I tried implementing Secondary Index by using Region Observers on
> Put.
> > It
> > >>> works well under low load. But, under heavy load the RO could not
> keep
> > up
> > >>> with load cross region server writes.
> > >>> Then, i decided not to use RO as per Andrew's explanation and  I
> moved
> > >> all
> > >>> the logic of building secondary index tables on my HBase Client .
> Since
> > >>> then, the system has been running fine under heavy load.
> > >>> IMO, if you will use RO and do cross RS read/write then perhaps this
> > will
> > >>> become your bottleneck in HBase.
> > >>> Is it possible for you to avoid RO and control the writes/updates
> from
> > >>> client side?
> > >>>
> > >>> Thanks,
> > >>> Anil Gupta
> > >>>
> > >>>
> > >>> On Fri, Oct 11, 2013 at 6:06 PM, John Weatherford <
> > >>> [email protected]> wrote:
> > >>>
> > >>>> OP Here :)
> > >>>>
> > >>>> Our current design involves a Region Observer on a table that does
> > >>>> increments on a second table. We took the approach that Michael said
> > and
> > >>>> inside the RO, we got a new connection and everything. We believe
> this
> > >> is
> > >>>> causing deadlocks for us. Our next attempt is going to be writing to
> > >>>> another row in the same table where we will store the increments. If
> > >> this
> > >>>> doesn't work, we are going to simply pull the increments out of the
> RO
> > >> and
> > >>>> do them in the application or in Flume.
> > >>>>
> > >>>> @Tom Brown
> > >>>> I would be very interested to hear more about your solution of
> > >>>> aggregating the increments in another system that is then
> responsible
> > >> for
> > >>>> updating in Hbase.
> > >>>>
> > >>>> -jW
> > >>>>
> > >>>>
> > >>>> On Fri 11 Oct 2013 10:26:58 AM PDT, Vladimir Rodionov wrote:
> > >>>>
> > >>>>> With respect to the OP's design… does the deadlock occur because
> he's
> > >>>>>>> trying to update a column in a different row within the same
> table?
> > >>>>>>>
> > >>>>>>
> > >>>>> Because he is trying to update *row* in a different Region (and
> > >>>>> potentially in different RS).
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Vladimir Rodionov
> > >>>>> Principal Platform Engineer
> > >>>>> Carrier IQ, www.carrieriq.com
> > >>>>> e-mail: [email protected]
> > >>>>>
> > >>>>> ______________________________**__________
> > >>>>> From: Michael Segel [[email protected]]
> > >>>>> Sent: Friday, October 11, 2013 9:10 AM
> > >>>>> To: [email protected]
> > >>>>> Cc: Vladimir Rodionov
> > >>>>> Subject: Re: Coprocessor Increments
> > >>>>>
> > >>>>>
> > >>>>> Confidentiality Notice:  The information contained in this message,
> > >>>>> including any attachments hereto, may be confidential and is
> intended
> > >> to be
> > >>>>> read only by the individual or entity to whom this message is
> > >> addressed. If
> > >>>>> the reader of this message is not the intended recipient or an
> agent
> > or
> > >>>>> designee of the intended recipient, please note that any review,
> use,
> > >>>>> disclosure or distribution of this message or its attachments, in
> any
> > >> form,
> > >>>>> is strictly prohibited.  If you have received this message in
> error,
> > >> please
> > >>>>> immediately notify the sender and/or [email protected]
> > >>>>> delete or destroy any copy of this message and its attachments.
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks & Regards,
> > >>> Anil Gupta
> > >>
> > >>
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> >
> >
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Coprocessor Increments

Reply via email to