Re: HTable.coprocessorExec call times out

Kevin Fri, 20 Jul 2012 06:23:39 -0700

In zookeeper I see that regionserver connections are timing out. I open an
HTable, call coprocessorExec, then I close the HTable. This is done in the
for-loop. I'm not sure why the regionservers are timing out. I think don't
think anymore it's a client-side issue but maybe a server-side issue with
the regionservers. I don't know what else to do to debug it further.


2012-07-19 07:01:26,738 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for
client /10.10.23.124:46863 which had sessionid 0x138763251670a3d
2012-07-19 07:01:56,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a37, timeout of 40000ms exceeded
2012-07-19 07:01:56,000 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a37
2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3d, timeout of 40000ms exceeded
2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a39, timeout of 40000ms exceeded
2012-07-19 07:02:00,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3f, timeout of 40000ms exceeded
2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3d
2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a39
2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3f
2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a38, timeout of 40000ms exceeded
2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a36, timeout of 40000ms exceeded
2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3c, timeout of 40000ms exceeded
2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3e, timeout of 40000ms exceeded
2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a38
2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a36
2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3c
2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3e


On Thu, Jul 19, 2012 at 4:55 PM, Kevin <[email protected]> wrote:

> That's it. That's the end of the regionserver log. In the master's web UI
> the regionserver is in the table labeled "Dead Region Servers."
>
> In the master's log there is:
> 2012-07-19 07:02:04,016 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> ephemeral node deleted, processing expiration [slave2,60020,1342694622535]
> 2012-07-19 07:02:04,025 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535 server
> being checked: slave2,60020,1342694622535
> 2012-07-19 07:02:04,025 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535 server
> being checked: slave2,60020,1342694622535
> 2012-07-19 07:02:04,027 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> Added=slave2,60020,1342694622535 to dead servers, submitted shutdown
> handler to be executed, root=true, meta=true
> 2012-07-19 07:02:04,027 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> logs for slave2,60020,1342694622535
>
> The timestamps are different than above but it's from the same application
> just at a different time than I sent before. The message would look the
> same only with the timestamps being 2012-07-19 09:49ish
>
>
> On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <[email protected]> wrote:
>
>> Can you paste more of the region server log after 09:49:18,551 (till the
>> region server died) ?
>>
>> Thanks
>>
>> On Thu, Jul 19, 2012 at 1:46 PM, Kevin <[email protected]> wrote:
>>
>> > The log snippet just before the regionservers die look like this:
>> >
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
>> > rowkey
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey
>> put
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time:
>> > 1227 ms
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing
>> scanner
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner
>> closed
>> > <after this log statement in the endpoint code is the return statement>
>> >
>> > A coprocessorExec call may be from 3-20 seconds after the previous (it
>> > depends how long the last call took). But I know the endpoints are
>> > finishing their code fast because throughout the log each "coproc time:"
>> > statement is under 5 seconds.
>> >
>> > I am using CDH4b2, which uses HBase 0.92.1.
>> >
>> > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <[email protected]> wrote:
>> >
>> > > Kevin:
>> > > Can you pastebin the log snippet from region server just before it
>> died ?
>> > >
>> > > How frequent were your coprocessorExec() calls ?
>> > > What HBase version were you using ?
>> > >
>> > > Thanks
>> > >
>> > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <[email protected]>
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm using endpoint coprocessors to do intense scans in parallel on
>> some
>> > > > tables. I log the time it takes for each coprocessor to finish its
>> job
>> > on
>> > > > the region. Each coprocessor rarely takes longer than a few seconds,
>> > > > maximum of 5 seconds (there are only 5 regions on the tables right
>> > now).
>> > > As
>> > > > my cluster grows with data the call HTable.coprocessorExec takes
>> longer
>> > > and
>> > > > longer but the coprocessors themselves finish quickly (under 5
>> > seconds).
>> > > > Eventually I see all my regionservers die because the
>> coprocessorExec
>> > > call
>> > > > timed out and zookeeper kills the connection, which makes the
>> > > regionserver
>> > > > die.
>> > > >
>> > > > In terms of code structure, the coprocessorExec call is done inside
>> a
>> > > > for-loop. The for-loop iterates over a List of objects to help form
>> > > filters
>> > > > for the endpoint and then calls the coprocessorExec once per object
>> > > > processed.
>> > > >
>> > > > What would be the bottleneck? Is calling the coprocessor like this
>> in a
>> > > > for-loop loading the regions down and not allowing them time to do
>> GC?
>> > Is
>> > > > there a way to ping a table and judge if it'll be ready for the
>> > endpoint
>> > > > call?
>> > > >
>> > > > Thanks,
>> > > > -Kevin
>> > > >
>> > >
>> >
>>
>
>

Re: HTable.coprocessorExec call times out

Reply via email to