Re: coprocessors WAS -> Re: Parallel computing on HBase

Mingjie Lai Sun, 10 Oct 2010 08:24:54 -0700

William.

For your particular request:

> each region server can
> calculate the mean of rows it contains on itself instead of

transporting

> every row back to the client


There is a related jira opened here:
https://issues.apache.org/jira/browse/HBASE-1512

Now it's a sub-ticket of HBASE-2000.

You have 2 options to perform aggregate/mean toward a region at regionserver:1) wait for MapReduce framework of coprocessor. However it won't beavailable soon, since it's be our highest priority right now. Andy had aprototype but we decided to took it off from HBASE-2001 patch. You maywant to contribute to the new design of it. (I will create a new jirafor the mapred framework for coprocessor).

2) utilize the CommandTarget: there is a simple CommandTarget samplewhich performs column aggregate on region server. But you need to knowsome HBase internal logic to build the CommandTarget. This piece will bechecked in to TRUNK soon I think.


Thanks,
Mingjie

On 10/07/2010 12:44 PM, William Kang wrote:

Hi St. Ack,
Thanks a lot for your information. I will look them up. If the coprocessors
can work with the 0.90 manual balanced hbase, that would be really nice.


William

On Thu, Oct 7, 2010 at 2:31 PM, Stack<[email protected]>  wrote:

William:

Coprocessors will be committed to TRUNK sometime in the next few days.
  They are well documented.  I suggest you start with this
package-info.html posted to hbase-2001 by Andrew and Mingjie:
https://issues.apache.org/jira/secure/attachment/12456164/packge-info.html
.
  It serves as a good intro to the utility coprocessors add and has
good example uses including examples that resemble strongly that which
you would like to do, described below.

St.Ack


On Wed, Oct 6, 2010 at 11:08 PM, William Kang<[email protected]>
wrote:

Ryan, thanks for your explanation. It is very clear and helpful.

Andy, I think Hbase-2000 is exactly what I was asking for. In general, MR

is

not built for low-latency purpose. But our applications do need something
fast and low weight. For example, we might just want to know the mean of

our

query results over some values inside rows. If each region server can
calculate the mean of rows it contains on itself instead of transporting
every row back to the client, it would be much faster to get the final
result. Will hbase-2000 be able to do it? And would you please share more
information about the development process and how may I contribute to it?
Many thanks.


William

On Wed, Oct 6, 2010 at 11:57 AM, Andrew Purtell<[email protected]>

wrote:

Hi William,

I think you are asking about HBASE-2000:
https://issues.apache.org/jira/browse/HBASE-2000

Work on an in-process parallel execution framework for HBase is in
progress, yes. We have some initial patches up for review which are the
start of this.

Best regards,

    - Andy


--- On Tue, 10/5/10, Ryan Rawson<[email protected]>  wrote:

From: Ryan Rawson<[email protected]>
Subject: Re: Parallel computing on HBase
To: [email protected]
Date: Tuesday, October 5, 2010, 11:10 PM
You understand the hbase data model
yes?  Each region gets a mapper
and each mapper reads the rows for that region feeding it
into the map
functions.  On the output side, each reducer just
writes to hbase. The
parallelism can support millions of row reads/second.

I don't understand the rest of your question
unfortunately.

good luck!
-ryan

On Tue, Oct 5, 2010 at 9:40 PM, William Kang<[email protected]>
wrote:

Can you tell me a little about how HBase works with

MR? If the MR

source/sink has to go through just ONE region client,

then it is not I am

looking for. But if MR can plug directly with the

region server containing

specific rows, then it might work. Furthermore, MR is

a heavy weight process

with lots of overhead. Ideally, we want something

light weight and can get

result fast. Many thanks.


William

On Wed, Oct 6, 2010 at 12:01 AM, Jeff Zhang<[email protected]>

wrote:

You can incorporate map reduce with hbase for

parallel computing.




On Wed, Oct 6, 2010 at 11:24 AM, William Kang

<[email protected]>

wrote:

Hi guys,
Is there any project going on co-processing

on region servers? Right now,

we

have to transfer all data from region servers

to region client after

query,

is that right? This can be slow. Furthermore,

the cpus on the region

servers

are not fully used. If we could distribute

the computation along with the

data on region server, that would be really

handy for some problems. Is

it

possible to do so? Many thanks.


William




--
Best Regards

Jeff Zhang

Re: coprocessors WAS -> Re: Parallel computing on HBase

Reply via email to