Hi, kylin users,
I encountered an strange timeout error today when buiding a cube.
By "strange", I mean the "hbase.rpc.timeout" configuration is set to 60000 in
hbase, but I get "org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
id=8099904, waitTime=5001, operationTimeout=5000 expired" errors.
Kylin version 2.2.0, runs on EMR, and it runs wihtout error for about half of a
month, suddenly it not work, the current cube is not the biggest one.
I am wondering where should I look, any help is appreciated.
The traceback from log:
```
2017-12-15 06:46:57,892 ERROR [Scheduler 2090031901 Job
c9067736-eac7-48ad-88f3-dbd6f4e870ae-167] execution.ExecutableManager:149 :
fail to get job output:c9067736-eac7-48ad-88f3-dbd6f4e870ae-14
org.apache.kylin.job.exception.PersistentException:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=1, exceptions:
Fri Dec 15 14:46:57 GMT+08:00 2017,
RpcRetryingCaller{globalStartTime=1513320412890, pause=100, retries=1},
java.io.IOException: Call to
ip-172-31-5-71.cn-north-1.compute.internal/172.31.5.71:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904,
waitTime=5001, operationTimeout=5000 expired.
at
org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:202)
at
org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:145)
at
org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:312)
at
org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:392)
at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:149)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=1, exceptions:
Fri Dec 15 14:46:57 GMT+08:00 2017,
RpcRetryingCaller{globalStartTime=1513320412890, pause=100, retries=1},
java.io.IOException: Call to
ip-172-31-5-71.cn-north-1.compute.internal/172.31.5.71:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904,
waitTime=5001, operationTimeout=5000 expired.
```