Hello, I’m running HBase 1.4.4. I’ve got a simple endpoint coprocessor that sums records when called. Whenever a split occurs, it fails when called, throwing a RegionNotFoundException. The error manifests itself by spending 10 minutes retrying the connection 35 times:
2019-02-19 09:42:34 INFO o.a.h.h.c.RpcRetryingCaller [hconnection-0x100f9a76-shared--pool3-t215]: Call exception, tries=25, retries=35, started=331810 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.NotServingRegionException: Region coprocessor-test,1,1550568604433.63f03f2a494dc5756238ba08af437af6. is not online on <hostname>,16020,1550568101996 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3082) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2201) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) row '1_pfx-cfb0e548-f399-4059-af80-54fe9b7a828f' on table 'coprocessor-test' at region=coprocessor-test,1_pfx-7b2b6071-7d2c-4282-9645-31ca027327dc6549,1550568988094.f6cc0c6245702c544fb7fe65c1e3299b., hostname=<hostname>l,16020,1550568101996, seqNum=630 before eventually failing: Tue Feb 19 09:37:02 UTC 2019, RpcRetryingCaller{globalStartTime=1550569022304, pause=100, retries=35}, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region coprocessor-test,9,1550568604433.2d98945e85cca401a2c5d8bd777a0451. is not online on <hostname>,16020,1550568099593 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3082) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2201) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) If I then re-run the coprocessor, it works without any issues. So, I need a way to quickly catch this error and manually retry it until it works. I can't see a way to change any useful parameter – the 35 retries and the time between retries seem to be hardcoded. Can anyone suggest how I can go about solving this? Regards, Ben