Hi Shen, Please do not use "emergency support' or something like that in the community, such wording is not good to let people help with your question. People need more time to understand your problem if they have time and will answer with their best effort, but no guarantee, please aware this.
"Everyone active in ASF projects is here as a volunteer, nobody is paid to provide support here." see here: https://community.apache.org/newbiefaq.html#how-do-i-get-user-support-for-an-asf-project for your problem, could you please send one question in one thread? Thanks. Luke Best Regards! --------------------- Luke Han On Tue, Apr 24, 2018 at 3:04 PM, 沈鲁威 <[email protected]> wrote: > There is nothing OOM or overload error in region server log. > > Our Hbase version is 1.2.0-cdh > > > 在 2018年4月24日,下午1:59,Ma Gang <[email protected]> 写道: > > You may check the region server log, is the related region server OOM or > overload? > > > 在 2018-04-24 13:47:08,"沈鲁威" <[email protected]> 写道: > > > >异常补充 > >ylin.log:Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > >org.apache.hadoop.hbase.DoNotRetryIOException: Coprocessor passed deadline! > >Maybe server is overloaded > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.checkDeadline(CubeVisitService.java:225) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:259) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:5555) > >kylin.log- at > >org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7931) > >kylin.log- at > >org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1969) > >kylin.log- at > >org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1951) > >kylin.log- at > >org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) > >-- > >kylin.log- at > >org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:107) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(CoprocessorRpcChannel.java:56) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService$Stub.visitCube(CubeVisitProtos.java:5616) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$2.call(CubeHBaseEndpointRPC.java:237) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$2.call(CubeHBaseEndpointRPC.java:206) > >kylin.log- at > >org.apache.hadoop.hbase.client.HTable$15.call(HTable.java:1800) > >kylin.log- at java.util.concurrent.FutureTask.run(FutureTask.java:266) > >kylin.log- at > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >kylin.log- at > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >kylin.log- ... 1 more > >kylin.log:Caused by: > >org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): > > org.apache.hadoop.hbase.DoNotRetryIOException: Coprocessor passed deadline! > >Maybe server is overloaded > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.checkDeadline(CubeVisitService.java:225) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:259) > >kylin.log- at > >org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:5555) > >kylin.log- at > >org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7931) > >kylin.log- at > >org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1969) > >kylin.log- at > >org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1951) > >kylin.log- at > >org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > >kylin.log- at > >org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) > >> 在 2018年4月23日,下午10:51,沈鲁威 <[email protected]> 写道: > >> > >> 各位大神: > >> 我们这边搭建了 cdh5.13.1+kylin.2.3.0 > >> 一台任务机,三台查询机slb 负载均衡(4核8G) > >> > >> > >> > >> 问题:工作的过程中经常隔断时间,某一台查询机器查询报超时,紧接着所有查询均不可用 > >> 只能kylin.sh stop 停掉这台查询机,其他机器才能正常工作 > >> > >> 查看机器负载 并不高 > >> 查看日志 出现过的错误日志 > >> 1、ncategorized SQLException for SQL []; SQL state [null]; error code [0]; > >> exception while executing query: java.io.IOException: POST failed, error > >> code 500 and response: {"code":"999","data":null,"msg":"Timeout visiting > >> cube! Check why coprocessor exception is not sent back? In coprocessor > >> Self-termination is checked every 100 scanned rows, the configured > >> timeout(54000) cannot support this many scans?\nwhile executing SQL: > >> \"select COALESCE(SUM(a.total_sale_money_kpi),0) as total_sale_money_kpi , > >> COALESCE(SUM(a.total_sale_count_kpi),0) as > >> 2、by total_sale_money_kpi desc ### Cause: java.sql.SQLException: exception > >> while executing query: java.io.IOException: POST failed, error code 500 > >> and response: > >> {"code":"999","data":null,"msg":"org.apache.hadoop.hbase.DoNotRetryIOException: > >> org.apache.hadoop.hbase.DoNotRetryIOException: Coprocessor passed > >> deadline! Maybe server is overloaded at > >> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.checkDeadline(CubeVisitService.java:225) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:259) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:5555) > >> at > >> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7931) > >> at > >> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1969) > >> at > >> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1951) > >> at > >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652) > >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) at > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at > >> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) > >> at > >> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163)\nwhile > >> executing SQL: > >> > >> > >> <CE1ED564E277BCD093CB59000F043C9F.png> > >> > >> > >> > >> jstack 查看日志 > >> > >> 情况1: > >> 有很多线程在等待同一个锁 多的话有100多个 > >> 怀疑可能有个锁被锁住了,而且可能是全局锁,因为一台机器有问题其他机器也没法查了 > >> > >> > >> "kylin-coproc--pool2-t82051" #93742 daemon prio=5 os_prio=0 > >> tid=0x00007f314d435800 nid=0x1fb waiting on condition [0x00007f315abad000] > >> java.lang.Thread.State: TIMED_WAITING (parking) > >> at sun.misc.Unsafe.park(Native Method) > >> - parking to wait for <0x00000007008eeff8> (a > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >> at > >> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > >> at > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > >> at > >> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) > >> at > >> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) > >> at > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Locked ownable synchronizers: > >> - None > >> > >> "kylin-coproc--pool2-t82050" #93741 daemon prio=5 os_prio=0 > >> tid=0x00007f314dc24800 nid=0x1fa waiting on condition [0x00007f315c1bb000] > >> java.lang.Thread.State: TIMED_WAITING (parking) > >> at sun.misc.Unsafe.park(Native Method) > >> - parking to wait for <0x00000007008eeff8> (a > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >> at > >> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > >> at > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > >> at > >> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) > >> at > >> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) > >> at > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Locked ownable synchronizers: > >> - None > >> > >> > >> > >> 情况2: > >> 线程池的问题:但是目前没找到哪类设置的线程池数量 > >> > >> 2018-04-22 10:56:13,407 ERROR [pool-10-thread-806] > >> v2.CubeHBaseEndpointRPC:340 : <sub-thread for Query > >> 492811-3d81d0ee-b6c9-443b-b652-3f94f5072cd1-1524365662180 GTScanRequest > >> 1578e6c>Error when visiting cubes by endpoint > >> java.util.concurrent.RejectedExecutionException: Task > >> java.util.concurrent.FutureTask@6006a8c3 rejected from > >> java.util.concurrent.ThreadPoolExecutor@276cb5e4[Shutting down, pool size > >> = 19, active threads = 19, queued tasks = 0, completed tasks = 90389] > >> at > >> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > >> at > >> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > >> at > >> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > >> at > >> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) > >> at > >> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1795) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC.runEPRange(CubeHBaseEndpointRPC.java:205) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC.access$000(CubeHBaseEndpointRPC.java:69) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$1.run(CubeHBaseEndpointRPC.java:186) > >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) > >> at > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> at java.lang.Thread.run(Thread.java:748) > >> 2018-04-22 10:56:13,407 DEBUG [Query > >> 492811-e0a95289-a23e-4eb2-a1d2-e0000fd66ac4-1524365662196-116] > >> gtrecord.GTCubeStorageQueryBase:311 : Need storage aggregation > >> 2018-04-22 10:56:13,408 INFO [Query > >> 123629-8888aa31-e163-41c7-84d2-4b06a6b8da18-1524365659125-143] > >> service.QueryService:1134 : Processed rows for each storageContext: 7 > >> 2018-04-22 10:56:13,408 ERROR [pool-10-thread-800] > >> v2.CubeHBaseEndpointRPC:340 : <sub-thread for Query > >> 492811-3d81d0ee-b6c9-443b-b652-3f94f5072cd1-1524365662180 GTScanRequest > >> 5677c55d>Error when visiting cubes by endpoint > >> java.util.concurrent.RejectedExecutionException: Task > >> java.util.concurrent.FutureTask@6006a8c3 rejected from > >> java.util.concurrent.ThreadPoolExecutor@276cb5e4[Shutting down, pool size > >> = 19, active threads = 19, queued tasks = 0, completed tasks = 90389] > >> at > >> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > >> at > >> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > >> at > >> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > >> at > >> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) > >> at > >> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1795) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC.runEPRange(CubeHBaseEndpointRPC.java:205) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC.access$000(CubeHBaseEndpointRPC.java:69) > >> at > >> org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$1.run(CubeHBaseEndpointRPC.java:186) > >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) > >> at > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> at java.lang.Thread.run(Thread.java:748) > >> > >> <091BC280DBCABF12925C7456BF791602.jpg> > >> > >> > >> 情况3:出现过如下错误 > >> hangzhou.dianjia.io trying to unlock > >> /kylin/kylin_metadata/job_engine/global_job_engine_lock > >> kylin.out: at > >> org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.unlock(ZookeeperDistributedLock.java:236) > >> kylin.out: at > >> org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.unlockJobEngine(ZookeeperDistributedLock.java:311) > >> kylin.out: at > >> org.apache.kylin.storage.hbase.util.ZookeeperJobLock.unlockJobEngine(ZookeeperJobLock.java:86) > >> kylin.out- at > >> org.apache.kylin.job.impl.threadpool.DefaultScheduler.shutdown(DefaultScheduler.java:234) > >> kylin.out- at > >> org.apache.kylin.rest.service.JobService$2.run(JobService.java:140) > >> kylin.out- at java.lang.Thread.run(Thread.java:748) > >> kylin.out-Caused by: java.lang.IllegalStateException: Client is not started > >> kylin.out- at > >> com.google.common.base.Preconditions.checkState(Preconditions.java:149) > >> kylin.out: at > >> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) > >> kylin.out- at > >> org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:477) > >> kylin.out- at > >> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238) > >> kylin.out- at > >> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233) > >> kylin.out- at > >> org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > >> kylin.out- at > >> org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) > >> kylin.out- at > >> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214) > >> kylin.out- at > >> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41) > >> kylin.out: at > >> org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.unlock(ZookeeperDistributedLock.java:231) > >> > >> > >> 怀疑过如下代码: > >> 但是我们验证过去掉同步锁 但是情况依旧。 > >> 多种情况下是下图66666到77777这个之间执行很慢。 > >> <B5ED37ABAEE71EB70911E69D10DD3252.png> > >> > >> > >> > >> > >> > >> <kylin配置.txt> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > > > > > >
