Can you paste more of the region server log after 09:49:18,551 (till the region server died) ?
Thanks On Thu, Jul 19, 2012 at 1:46 PM, Kevin <[email protected]> wrote: > The log snippet just before the regionservers die look like this: > > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new > rowkey > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey put > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time: > 1227 ms > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing scanner > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner closed > <after this log statement in the endpoint code is the return statement> > > A coprocessorExec call may be from 3-20 seconds after the previous (it > depends how long the last call took). But I know the endpoints are > finishing their code fast because throughout the log each "coproc time:" > statement is under 5 seconds. > > I am using CDH4b2, which uses HBase 0.92.1. > > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <[email protected]> wrote: > > > Kevin: > > Can you pastebin the log snippet from region server just before it died ? > > > > How frequent were your coprocessorExec() calls ? > > What HBase version were you using ? > > > > Thanks > > > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <[email protected]> > wrote: > > > > > Hi, > > > > > > I'm using endpoint coprocessors to do intense scans in parallel on some > > > tables. I log the time it takes for each coprocessor to finish its job > on > > > the region. Each coprocessor rarely takes longer than a few seconds, > > > maximum of 5 seconds (there are only 5 regions on the tables right > now). > > As > > > my cluster grows with data the call HTable.coprocessorExec takes longer > > and > > > longer but the coprocessors themselves finish quickly (under 5 > seconds). > > > Eventually I see all my regionservers die because the coprocessorExec > > call > > > timed out and zookeeper kills the connection, which makes the > > regionserver > > > die. > > > > > > In terms of code structure, the coprocessorExec call is done inside a > > > for-loop. The for-loop iterates over a List of objects to help form > > filters > > > for the endpoint and then calls the coprocessorExec once per object > > > processed. > > > > > > What would be the bottleneck? Is calling the coprocessor like this in a > > > for-loop loading the regions down and not allowing them time to do GC? > Is > > > there a way to ping a table and judge if it'll be ready for the > endpoint > > > call? > > > > > > Thanks, > > > -Kevin > > > > > >
