Kevin: Can you pastebin the log snippet from region server just before it died ?
How frequent were your coprocessorExec() calls ? What HBase version were you using ? Thanks On Thu, Jul 19, 2012 at 12:44 PM, Kevin <[email protected]> wrote: > Hi, > > I'm using endpoint coprocessors to do intense scans in parallel on some > tables. I log the time it takes for each coprocessor to finish its job on > the region. Each coprocessor rarely takes longer than a few seconds, > maximum of 5 seconds (there are only 5 regions on the tables right now). As > my cluster grows with data the call HTable.coprocessorExec takes longer and > longer but the coprocessors themselves finish quickly (under 5 seconds). > Eventually I see all my regionservers die because the coprocessorExec call > timed out and zookeeper kills the connection, which makes the regionserver > die. > > In terms of code structure, the coprocessorExec call is done inside a > for-loop. The for-loop iterates over a List of objects to help form filters > for the endpoint and then calls the coprocessorExec once per object > processed. > > What would be the bottleneck? Is calling the coprocessor like this in a > for-loop loading the regions down and not allowing them time to do GC? Is > there a way to ping a table and judge if it'll be ready for the endpoint > call? > > Thanks, > -Kevin >
