On Mon, Aug 26, 2013 at 10:56 PM, Olle Mårtensson <[email protected] > wrote:
> Thank you for the link Anil it was a good explanation indeed. > > >It's not recommended to do put/deletes across > >region servers like this. > > That was not my intention, I want to keep the region for the aggregates and > the aggregated values on the same server. I read in the link that you gave > me that I can achieve this by using coprocessor on the master, so I will > try that out. > > >Try to move this aggregation on the client side > >or at least outside RS. > > This is what I try to avoid since doing this would cause big data transfers > between the client and the region server. > The whole purpose of using the coprocessor is to push the aggregation work > to the nodes where data is local and to minimize data transfer between the > nodes. > > Why do you think it's a bad idea to do aggregate values inside of the > regionserver, is it because it occupies RPC threads or because it's not a > good usecase for coprocessors ? > I got the impression that your code is doing Inter-RS puts/gets from the coprocessor. > Do you think it's a bad idea even if I keep the regions for the two rows > involved on the same regionserver and bypass RPC as the link suggests? > In my opinion, then it should be fine. I am not aware of how heavy/complex your aggregations are. Obviously, more complex your CP(coprocessor) is, more load you are putting on RS. > > Thanks // Olle > > > On Mon, Aug 26, 2013 at 5:43 PM, anil gupta <[email protected]> wrote: > > > On Mon, Aug 26, 2013 at 7:27 AM, Olle Mårtensson > > <[email protected]>wrote: > > > > > Hi, > > > > > > I have developed a coprocessor that is extending BaseRegionObserver and > > > implements the > > > postPut method. The postPut method scans the columns of the row that > the > > > put was issued on and calculates an aggregated based on these values, > > when > > > this is done a row in another table is updated with the aggregated > value. > > > > > This is an anti-pattern. It's not recommended to do put/deletes across > > region servers like this. Try to move this aggregation on the client side > > or at least outside RS. Here is the link for much detailed explanation > why > > this is not good: http://search-hadoop.com/m/XtAi5Fogw32 > > > > > This works out fine until I put some stress on one row, then the > threads > > on > > > the regionserver hosting the table will freeze on flushing the put on > the > > > aggregated value. > > > The client application basically do 100 concurrent puts on one row in a > > > tight loop( on the table where the coprocessor is activated ). > > > After that the client sleeps for a while and tries to fetch the > > aggregated > > > value and here the client freezes and periodically burps out > exceptions. > > > It works if I don't run so many put's in parallel. > > > > > > The HBASE environment is pseudo distributed 0.94.11 with one > > regionserver. > > > > > > I have tried using a connection pool in the coprocessor, bumped up the > > > heapsize of the regionServer and also to up the number of RPC threads > for > > > the regionserver but without luck. > > > > > > The pseudo code postPut would be something like this: > > > > > > vals = env.getRegion().get(get).getFamilyMap().values() > > > agg_val = aggregate(vals) > > > agg_table = env.getTable("aggregates") > > > agg_table.setAutoFlush(false) > > > put = new Put() > > > put.add(agg_val) > > > agg_table.put(put) > > > agg_table.flushCommits() > > > agg_table.close() > > > > > > And the real clojure variant is: > > > > > > https://gist.github.com/ollez/d0450930a591912aea5d#file-gistfile1-clj > > > > > > The hbase-site.xml: > > > > > > https://gist.github.com/ollez/d0450930a591912aea5d#file-hbase-site-xml > > > > > > The regionserver stacktrace: > > > > > > > > > > > > https://gist.github.com/ollez/d0450930a591912aea5d#file-regionserver-stacktrace > > > > > > The client exceptions: > > > > > > > > > https://gist.github.com/ollez/d0450930a591912aea5d#file-client-exceptions > > > > > > Thanks // Olle > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
