That's it. That's the end of the regionserver log. In the master's web UI the regionserver is in the table labeled "Dead Region Servers."
In the master's log there is: 2012-07-19 07:02:04,016 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [slave2,60020,1342694622535] 2012-07-19 07:02:04,025 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535 server being checked: slave2,60020,1342694622535 2012-07-19 07:02:04,025 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535 server being checked: slave2,60020,1342694622535 2012-07-19 07:02:04,027 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=slave2,60020,1342694622535 to dead servers, submitted shutdown handler to be executed, root=true, meta=true 2012-07-19 07:02:04,027 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for slave2,60020,1342694622535 The timestamps are different than above but it's from the same application just at a different time than I sent before. The message would look the same only with the timestamps being 2012-07-19 09:49ish On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <[email protected]> wrote: > Can you paste more of the region server log after 09:49:18,551 (till the > region server died) ? > > Thanks > > On Thu, Jul 19, 2012 at 1:46 PM, Kevin <[email protected]> wrote: > > > The log snippet just before the regionservers die look like this: > > > > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new > > rowkey > > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey put > > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time: > > 1227 ms > > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing > scanner > > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner closed > > <after this log statement in the endpoint code is the return statement> > > > > A coprocessorExec call may be from 3-20 seconds after the previous (it > > depends how long the last call took). But I know the endpoints are > > finishing their code fast because throughout the log each "coproc time:" > > statement is under 5 seconds. > > > > I am using CDH4b2, which uses HBase 0.92.1. > > > > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <[email protected]> wrote: > > > > > Kevin: > > > Can you pastebin the log snippet from region server just before it > died ? > > > > > > How frequent were your coprocessorExec() calls ? > > > What HBase version were you using ? > > > > > > Thanks > > > > > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > I'm using endpoint coprocessors to do intense scans in parallel on > some > > > > tables. I log the time it takes for each coprocessor to finish its > job > > on > > > > the region. Each coprocessor rarely takes longer than a few seconds, > > > > maximum of 5 seconds (there are only 5 regions on the tables right > > now). > > > As > > > > my cluster grows with data the call HTable.coprocessorExec takes > longer > > > and > > > > longer but the coprocessors themselves finish quickly (under 5 > > seconds). > > > > Eventually I see all my regionservers die because the coprocessorExec > > > call > > > > timed out and zookeeper kills the connection, which makes the > > > regionserver > > > > die. > > > > > > > > In terms of code structure, the coprocessorExec call is done inside a > > > > for-loop. The for-loop iterates over a List of objects to help form > > > filters > > > > for the endpoint and then calls the coprocessorExec once per object > > > > processed. > > > > > > > > What would be the bottleneck? Is calling the coprocessor like this > in a > > > > for-loop loading the regions down and not allowing them time to do > GC? > > Is > > > > there a way to ping a table and judge if it'll be ready for the > > endpoint > > > > call? > > > > > > > > Thanks, > > > > -Kevin > > > > > > > > > >
