Re: coprocessors WAS -> Re: Parallel computing on HBase

William Kang Sun, 10 Oct 2010 15:27:14 -0700

Hi Mingjie,
Thanks so much for your update on this issue. I will look into the code to
see what would be the best way to get it to work.



William

On Sun, Oct 10, 2010 at 11:24 AM, Mingjie Lai <[email protected]> wrote:

> William.
>
> For your particular request:
>
>
>  > each region server can
>> > calculate the mean of rows it contains on itself instead of
>>
> transporting
>
>> > every row back to the client
>>
>
> There is a related jira opened here:
> https://issues.apache.org/jira/browse/HBASE-1512
>
> Now it's a sub-ticket of HBASE-2000.
>
> You have 2 options to perform aggregate/mean toward a region at region
> server:
> 1) wait for MapReduce framework of coprocessor. However it won't be
> available soon, since it's be our highest priority right now. Andy had a
> prototype but we decided to took it off from HBASE-2001 patch. You may want
> to contribute to the new design of it. (I will create a new jira for the
> mapred framework for coprocessor).
>
> 2) utilize the CommandTarget: there is a simple CommandTarget sample which
> performs column aggregate on region server. But you need to know some HBase
> internal logic to build the CommandTarget. This piece will be checked in to
> TRUNK soon I think.
>
> Thanks,
> Mingjie
>
>
> On 10/07/2010 12:44 PM, William Kang wrote:
>
>> Hi St. Ack,
>> Thanks a lot for your information. I will look them up. If the
>> coprocessors
>> can work with the 0.90 manual balanced hbase, that would be really nice.
>>
>>
>> William
>>
>> On Thu, Oct 7, 2010 at 2:31 PM, Stack<[email protected]>  wrote:
>>
>>  William:
>>>
>>> Coprocessors will be committed to TRUNK sometime in the next few days.
>>>  They are well documented.  I suggest you start with this
>>> package-info.html posted to hbase-2001 by Andrew and Mingjie:
>>>
>>> https://issues.apache.org/jira/secure/attachment/12456164/packge-info.html
>>> .
>>>  It serves as a good intro to the utility coprocessors add and has
>>> good example uses including examples that resemble strongly that which
>>> you would like to do, described below.
>>>
>>> St.Ack
>>>
>>>
>>> On Wed, Oct 6, 2010 at 11:08 PM, William Kang<[email protected]>
>>> wrote:
>>>
>>>> Ryan, thanks for your explanation. It is very clear and helpful.
>>>>
>>>> Andy, I think Hbase-2000 is exactly what I was asking for. In general,
>>>> MR
>>>>
>>> is
>>>
>>>> not built for low-latency purpose. But our applications do need
>>>> something
>>>> fast and low weight. For example, we might just want to know the mean of
>>>>
>>> our
>>>
>>>> query results over some values inside rows. If each region server can
>>>> calculate the mean of rows it contains on itself instead of transporting
>>>> every row back to the client, it would be much faster to get the final
>>>> result. Will hbase-2000 be able to do it? And would you please share
>>>> more
>>>> information about the development process and how may I contribute to
>>>> it?
>>>> Many thanks.
>>>>
>>>>
>>>> William
>>>>
>>>> On Wed, Oct 6, 2010 at 11:57 AM, Andrew Purtell<[email protected]>
>>>>
>>> wrote:
>>>
>>>>
>>>>  Hi William,
>>>>>
>>>>> I think you are asking about HBASE-2000:
>>>>> https://issues.apache.org/jira/browse/HBASE-2000
>>>>>
>>>>> Work on an in-process parallel execution framework for HBase is in
>>>>> progress, yes. We have some initial patches up for review which are the
>>>>> start of this.
>>>>>
>>>>> Best regards,
>>>>>
>>>>>    - Andy
>>>>>
>>>>>
>>>>> --- On Tue, 10/5/10, Ryan Rawson<[email protected]>  wrote:
>>>>>
>>>>>  From: Ryan Rawson<[email protected]>
>>>>>> Subject: Re: Parallel computing on HBase
>>>>>> To: [email protected]
>>>>>> Date: Tuesday, October 5, 2010, 11:10 PM
>>>>>> You understand the hbase data model
>>>>>> yes?  Each region gets a mapper
>>>>>> and each mapper reads the rows for that region feeding it
>>>>>> into the map
>>>>>> functions.  On the output side, each reducer just
>>>>>> writes to hbase. The
>>>>>> parallelism can support millions of row reads/second.
>>>>>>
>>>>>> I don't understand the rest of your question
>>>>>> unfortunately.
>>>>>>
>>>>>> good luck!
>>>>>> -ryan
>>>>>>
>>>>>> On Tue, Oct 5, 2010 at 9:40 PM, William Kang<[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Can you tell me a little about how HBase works with
>>>>>>>
>>>>>> MR? If the MR
>>>>>>
>>>>>>> source/sink has to go through just ONE region client,
>>>>>>>
>>>>>> then it is not I am
>>>>>>
>>>>>>> looking for. But if MR can plug directly with the
>>>>>>>
>>>>>> region server containing
>>>>>>
>>>>>>> specific rows, then it might work. Furthermore, MR is
>>>>>>>
>>>>>> a heavy weight process
>>>>>>
>>>>>>> with lots of overhead. Ideally, we want something
>>>>>>>
>>>>>> light weight and can get
>>>>>>
>>>>>>> result fast. Many thanks.
>>>>>>>
>>>>>>>
>>>>>>> William
>>>>>>>
>>>>>>> On Wed, Oct 6, 2010 at 12:01 AM, Jeff Zhang<[email protected]>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>  You can incorporate map reduce with hbase for
>>>>>>>>
>>>>>>> parallel computing.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 6, 2010 at 11:24 AM, William Kang
>>>>>>>>
>>>>>>> <[email protected]>
>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>> Is there any project going on co-processing
>>>>>>>>>
>>>>>>>> on region servers? Right now,
>>>>>>
>>>>>>> we
>>>>>>>>
>>>>>>>>> have to transfer all data from region servers
>>>>>>>>>
>>>>>>>> to region client after
>>>>>>
>>>>>>> query,
>>>>>>>>
>>>>>>>>> is that right? This can be slow. Furthermore,
>>>>>>>>>
>>>>>>>> the cpus on the region
>>>>>>
>>>>>>> servers
>>>>>>>>
>>>>>>>>> are not fully used. If we could distribute
>>>>>>>>>
>>>>>>>> the computation along with the
>>>>>>
>>>>>>> data on region server, that would be really
>>>>>>>>>
>>>>>>>> handy for some problems. Is
>>>>>>
>>>>>>> it
>>>>>>>>
>>>>>>>>> possible to do so? Many thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> William
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: coprocessors WAS -> Re: Parallel computing on HBase

Reply via email to