Hi Mingjie, Thanks so much for your update on this issue. I will look into the code to see what would be the best way to get it to work.
William On Sun, Oct 10, 2010 at 11:24 AM, Mingjie Lai <[email protected]> wrote: > William. > > For your particular request: > > > > each region server can >> > calculate the mean of rows it contains on itself instead of >> > transporting > >> > every row back to the client >> > > There is a related jira opened here: > https://issues.apache.org/jira/browse/HBASE-1512 > > Now it's a sub-ticket of HBASE-2000. > > You have 2 options to perform aggregate/mean toward a region at region > server: > 1) wait for MapReduce framework of coprocessor. However it won't be > available soon, since it's be our highest priority right now. Andy had a > prototype but we decided to took it off from HBASE-2001 patch. You may want > to contribute to the new design of it. (I will create a new jira for the > mapred framework for coprocessor). > > 2) utilize the CommandTarget: there is a simple CommandTarget sample which > performs column aggregate on region server. But you need to know some HBase > internal logic to build the CommandTarget. This piece will be checked in to > TRUNK soon I think. > > Thanks, > Mingjie > > > On 10/07/2010 12:44 PM, William Kang wrote: > >> Hi St. Ack, >> Thanks a lot for your information. I will look them up. If the >> coprocessors >> can work with the 0.90 manual balanced hbase, that would be really nice. >> >> >> William >> >> On Thu, Oct 7, 2010 at 2:31 PM, Stack<[email protected]> wrote: >> >> William: >>> >>> Coprocessors will be committed to TRUNK sometime in the next few days. >>> They are well documented. I suggest you start with this >>> package-info.html posted to hbase-2001 by Andrew and Mingjie: >>> >>> https://issues.apache.org/jira/secure/attachment/12456164/packge-info.html >>> . >>> It serves as a good intro to the utility coprocessors add and has >>> good example uses including examples that resemble strongly that which >>> you would like to do, described below. >>> >>> St.Ack >>> >>> >>> On Wed, Oct 6, 2010 at 11:08 PM, William Kang<[email protected]> >>> wrote: >>> >>>> Ryan, thanks for your explanation. It is very clear and helpful. >>>> >>>> Andy, I think Hbase-2000 is exactly what I was asking for. In general, >>>> MR >>>> >>> is >>> >>>> not built for low-latency purpose. But our applications do need >>>> something >>>> fast and low weight. For example, we might just want to know the mean of >>>> >>> our >>> >>>> query results over some values inside rows. If each region server can >>>> calculate the mean of rows it contains on itself instead of transporting >>>> every row back to the client, it would be much faster to get the final >>>> result. Will hbase-2000 be able to do it? And would you please share >>>> more >>>> information about the development process and how may I contribute to >>>> it? >>>> Many thanks. >>>> >>>> >>>> William >>>> >>>> On Wed, Oct 6, 2010 at 11:57 AM, Andrew Purtell<[email protected]> >>>> >>> wrote: >>> >>>> >>>> Hi William, >>>>> >>>>> I think you are asking about HBASE-2000: >>>>> https://issues.apache.org/jira/browse/HBASE-2000 >>>>> >>>>> Work on an in-process parallel execution framework for HBase is in >>>>> progress, yes. We have some initial patches up for review which are the >>>>> start of this. >>>>> >>>>> Best regards, >>>>> >>>>> - Andy >>>>> >>>>> >>>>> --- On Tue, 10/5/10, Ryan Rawson<[email protected]> wrote: >>>>> >>>>> From: Ryan Rawson<[email protected]> >>>>>> Subject: Re: Parallel computing on HBase >>>>>> To: [email protected] >>>>>> Date: Tuesday, October 5, 2010, 11:10 PM >>>>>> You understand the hbase data model >>>>>> yes? Each region gets a mapper >>>>>> and each mapper reads the rows for that region feeding it >>>>>> into the map >>>>>> functions. On the output side, each reducer just >>>>>> writes to hbase. The >>>>>> parallelism can support millions of row reads/second. >>>>>> >>>>>> I don't understand the rest of your question >>>>>> unfortunately. >>>>>> >>>>>> good luck! >>>>>> -ryan >>>>>> >>>>>> On Tue, Oct 5, 2010 at 9:40 PM, William Kang<[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Can you tell me a little about how HBase works with >>>>>>> >>>>>> MR? If the MR >>>>>> >>>>>>> source/sink has to go through just ONE region client, >>>>>>> >>>>>> then it is not I am >>>>>> >>>>>>> looking for. But if MR can plug directly with the >>>>>>> >>>>>> region server containing >>>>>> >>>>>>> specific rows, then it might work. Furthermore, MR is >>>>>>> >>>>>> a heavy weight process >>>>>> >>>>>>> with lots of overhead. Ideally, we want something >>>>>>> >>>>>> light weight and can get >>>>>> >>>>>>> result fast. Many thanks. >>>>>>> >>>>>>> >>>>>>> William >>>>>>> >>>>>>> On Wed, Oct 6, 2010 at 12:01 AM, Jeff Zhang<[email protected]> >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> You can incorporate map reduce with hbase for >>>>>>>> >>>>>>> parallel computing. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 6, 2010 at 11:24 AM, William Kang >>>>>>>> >>>>>>> <[email protected]> >>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi guys, >>>>>>>>> Is there any project going on co-processing >>>>>>>>> >>>>>>>> on region servers? Right now, >>>>>> >>>>>>> we >>>>>>>> >>>>>>>>> have to transfer all data from region servers >>>>>>>>> >>>>>>>> to region client after >>>>>> >>>>>>> query, >>>>>>>> >>>>>>>>> is that right? This can be slow. Furthermore, >>>>>>>>> >>>>>>>> the cpus on the region >>>>>> >>>>>>> servers >>>>>>>> >>>>>>>>> are not fully used. If we could distribute >>>>>>>>> >>>>>>>> the computation along with the >>>>>> >>>>>>> data on region server, that would be really >>>>>>>>> >>>>>>>> handy for some problems. Is >>>>>> >>>>>>> it >>>>>>>> >>>>>>>>> possible to do so? Many thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> William >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards >>>>>>>> >>>>>>>> Jeff Zhang >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>
