coprocessors WAS -> Re: Parallel computing on HBase

Stack Thu, 07 Oct 2010 11:32:12 -0700

William:

Coprocessors will be committed to TRUNK sometime in the next few days.
 They are well documented.  I suggest you start with this
package-info.html posted to hbase-2001 by Andrew and Mingjie:
https://issues.apache.org/jira/secure/attachment/12456164/packge-info.html.
 It serves as a good intro to the utility coprocessors add and has
good example uses including examples that resemble strongly that which
you would like to do, described below.


St.Ack


On Wed, Oct 6, 2010 at 11:08 PM, William Kang <[email protected]> wrote:
> Ryan, thanks for your explanation. It is very clear and helpful.
>
> Andy, I think Hbase-2000 is exactly what I was asking for. In general, MR is
> not built for low-latency purpose. But our applications do need something
> fast and low weight. For example, we might just want to know the mean of our
> query results over some values inside rows. If each region server can
> calculate the mean of rows it contains on itself instead of transporting
> every row back to the client, it would be much faster to get the final
> result. Will hbase-2000 be able to do it? And would you please share more
> information about the development process and how may I contribute to it?
> Many thanks.
>
>
> William
>
> On Wed, Oct 6, 2010 at 11:57 AM, Andrew Purtell <[email protected]> wrote:
>
>> Hi William,
>>
>> I think you are asking about HBASE-2000:
>> https://issues.apache.org/jira/browse/HBASE-2000
>>
>> Work on an in-process parallel execution framework for HBase is in
>> progress, yes. We have some initial patches up for review which are the
>> start of this.
>>
>> Best regards,
>>
>>    - Andy
>>
>>
>> --- On Tue, 10/5/10, Ryan Rawson <[email protected]> wrote:
>>
>> > From: Ryan Rawson <[email protected]>
>> > Subject: Re: Parallel computing on HBase
>> > To: [email protected]
>> > Date: Tuesday, October 5, 2010, 11:10 PM
>> > You understand the hbase data model
>> > yes?  Each region gets a mapper
>> > and each mapper reads the rows for that region feeding it
>> > into the map
>> > functions.  On the output side, each reducer just
>> > writes to hbase. The
>> > parallelism can support millions of row reads/second.
>> >
>> > I don't understand the rest of your question
>> > unfortunately.
>> >
>> > good luck!
>> > -ryan
>> >
>> > On Tue, Oct 5, 2010 at 9:40 PM, William Kang <[email protected]>
>> > wrote:
>> > > Can you tell me a little about how HBase works with
>> > MR? If the MR
>> > > source/sink has to go through just ONE region client,
>> > then it is not I am
>> > > looking for. But if MR can plug directly with the
>> > region server containing
>> > > specific rows, then it might work. Furthermore, MR is
>> > a heavy weight process
>> > > with lots of overhead. Ideally, we want something
>> > light weight and can get
>> > > result fast. Many thanks.
>> > >
>> > >
>> > > William
>> > >
>> > > On Wed, Oct 6, 2010 at 12:01 AM, Jeff Zhang <[email protected]>
>> > wrote:
>> > >
>> > >> You can incorporate map reduce with hbase for
>> > parallel computing.
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Oct 6, 2010 at 11:24 AM, William Kang
>> > <[email protected]>
>> > >> wrote:
>> > >> > Hi guys,
>> > >> > Is there any project going on co-processing
>> > on region servers? Right now,
>> > >> we
>> > >> > have to transfer all data from region servers
>> > to region client after
>> > >> query,
>> > >> > is that right? This can be slow. Furthermore,
>> > the cpus on the region
>> > >> servers
>> > >> > are not fully used. If we could distribute
>> > the computation along with the
>> > >> > data on region server, that would be really
>> > handy for some problems. Is
>> > >> it
>> > >> > possible to do so? Many thanks.
>> > >> >
>> > >> >
>> > >> > William
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards
>> > >>
>> > >> Jeff Zhang
>> > >>
>> > >
>> >
>>
>>
>>
>>
>>
>

coprocessors WAS -> Re: Parallel computing on HBase

Reply via email to