major compaction race condition

Ian Brooks Fri, 06 Mar 2015 02:05:26 -0800

Hi,

I'm currently seeing an issue with the interaction between HBase and one of our 
applications that seems to occur if a request is made against a region as its 
undergoing a major compaction.


The application gets a list of rowkeys from an index table then for each block 
of 1000 rowkeys gets the data from the main table by calling a get with an 
array of the keys. This is normally fine, the process that gets this data runs 
many times a second, however occasionally it causes one of the region server 
(different each time) to start using 300% cpu permanently until I restart the 
client application.

What I see in the regionserver logs is 

2015-03-06 07:01:49,726 INFO  [regionserver16020.leaseChecker] 
regionserver.HRegionServer: Scanner 14676901 lease expired on region 
requestData,230000000000,1407498276182.1b8c522e55b5f6bd5b60e007fa069237.
2015-03-06 07:01:49,794 WARN  [RpcServer.reader=4,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: count of bytes read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at 
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2229)
        at 
org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1415)
        at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:790)
        at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:581)
        at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:556)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-03-06 07:01:49,804 INFO  
[rs(##########,16020,1417531485263)-snapshot-pool11246-thread-1] 
regionserver.HStore: Added 
hdfs://swcluster1/user/hbase/data/default/requestData/e11543982df2ce616c9efad5ce5c3784/data/04eff77d329b4f30965e94765072e5b6,
 entries=48, sequenceid=352934, filesize=15.2 K
2015-03-06 07:01:49,804 INFO  
[rs(##########,16020,1417531485263)-snapshot-pool11246-thread-1] 
regionserver.HRegion: Finished memstore flush of ~42.2 K/43200, currentsize=0/0 
for region 
requestData,370000000000,1407498276182.e11543982df2ce616c9efad5ce5c3784. in 
315ms, sequenceid=352934, compaction requested=false
2015-03-06 07:01:49,805 WARN  [RpcServer.handler=86,port=16020] ipc.RpcServer: 
RpcServer.respondercallId: 9816700 service: ClientService methodName: Scan 
size: 27 connection: a.b.c.d:36465: output error
2015-03-06 07:01:49,805 DEBUG 
[rs(##########,16020,1417531485263)-snapshot-pool11246-thread-1] 
regionserver.HRegion: Storing region-info for snapshot.
2015-03-06 07:01:49,822 WARN  [RpcServer.handler=86,port=16020] ipc.RpcServer: 
RpcServer.handler=86,port=16020: caught a ClosedChannelException, this means 
that the server was processing a request but the client went away. The error 
message was: null
2015-03-06 07:01:49,823 WARN  [RpcServer.handler=7,port=16020] ipc.RpcServer: 
RpcServer.respondercallId: 9816459 service: ClientService methodName: Scan 
size: 27 connection: a.b.c.d:36465: output error
2015-03-06 07:01:49,823 WARN  [RpcServer.handler=7,port=16020] ipc.RpcServer: 
RpcServer.handler=7,port=16020: caught a ClosedChannelException, this means 
that the server was processing a request but the client went away. The error 
message was: null

and the thousands of lines of 
2015-03-06 07:04:32,337 WARN  [RpcServer.handler=94,port=16020] ipc.RpcServer: 
RpcServer.respondercallId: 9818168 service: ClientService methodName: Scan 
size: 27 connection: a.b.c.d:37431: output error
2015-03-06 07:04:32,337 WARN  [RpcServer.handler=94,port=16020] ipc.RpcServer: 
RpcServer.handler=94,port=16020: caught a ClosedChannelException, this means 
that the server was processing a request but the client went away. The error 
message was: null
2015-03-06 07:04:32,338 WARN  [RpcServer.handler=32,port=16020] ipc.RpcServer: 
RpcServer.respondercallId: 9818236 service: ClientService methodName: Scan 
size: 27 connection: a.b.c.d:37431: output error
2015-03-06 07:04:32,338 WARN  [RpcServer.handler=32,port=16020] ipc.RpcServer: 
RpcServer.handler=32,port=16020: caught a ClosedChannelException, this means 
that the server was processing a request but the client went away. The error 
message was: null
2015-03-06 07:04:33,424 INFO  [regionserver16020.leaseChecker] 
regionserver.HRegionServer: Scanner 14677148 lease expired on region 
requestData,230000000000,1407498276182.1b8c522e55b5f6bd5b60e007fa069237.
2015-03-06 07:04:33,531 INFO  [regionserver16020.leaseChecker] 
regionserver.HRegionServer: Scanner 14677194 lease expired on region 
requestData,230000000000,1407498276182.1b8c522e55b5f6bd5b60e007fa069237.
2015-03-06 07:04:33,531 INFO  [regionserver16020.leaseChecker] 
regionserver.HRegionServer: Scanner 14677218 lease expired on region 
requestData,230000000000,1407498276182.1b8c522e55b5f6bd5b60e007fa069237.
2015-03-06 07:04:33,531 INFO  [regionserver16020.leaseChecker] 
regionserver.HRegionServer: Scanner 14677188 lease expired on region 
requestData,230000000000,1407498276182.1b8c522e55b5f6bd5b60e007fa069237.
2015-03-06 07:04:33,531 INFO  [regionserver16020.leaseChecker] 
regionserver.HRegionServer: Scanner 14677182 lease expired on region 
requestData,230000000000,1407498276182.1b8c522e55b5f6bd5b60e007fa069237.

till i restart the client.

I'm using hbase 0.98.3 on hadoop 2.4.0

Is there a specific way to handle this in our code to prevent the regionserver 
permanently trying to process and return the data?




-Ian

major compaction race condition

Reply via email to