Hi Billy, Thank you for pointing the previous discussion. But for now we are running a very small hbase cluster for lower cost, which has only one slave node. So the unsteady response time (in a range not two bad, eg: within 1 minute) is somehow acceptable. The previous timeout error just interrupted the cube building procedure, we don't wan't that. What is your suggestion for this use case?
在2017年12月16 11时48分, "Billy Liu"<[email protected]>写道: Check this: http://apache-kylin.74782.x6.nabble.com/hbase-configed-with-fixed-value-td9241.html 2017-12-15 18:03 GMT+08:00 jxs <[email protected]>: Hi, Finally, I found this in org.apache.kylin.storage.hbase.HBaseResourceStore: ``` private StorageURL buildMetadataUrl(KylinConfig kylinConfig) throws IOException { StorageURL url = kylinConfig.getMetadataUrl(); if (!url.getScheme().equals("hbase")) throw new IOException("Cannot create HBaseResourceStore. Url not match. Url: " + url); // control timeout for prompt error report Map<String, String> newParams = new LinkedHashMap<>(); newParams.put("hbase.client.scanner.timeout.period", "10000"); newParams.put("hbase.rpc.timeout", "5000"); newParams.put("hbase.client.retries.number", "1"); newParams.putAll(url.getAllParameters()); return url.copy(newParams); } ``` Is this related to the timeout error? Why these params are hard coded instead of reading from configuration, is there any workaround for this timeout error? 在2017年12月15 16时03分, "jxs"<[email protected]>写道: Hi, kylin users, I encountered an strange timeout error today when buiding a cube. By "strange", I mean the "hbase.rpc.timeout" configuration is set to 60000 in hbase, but I get "org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904, waitTime=5001, operationTimeout=5000 expired" errors. Kylin version 2.2.0, runs on EMR, and it runs wihtout error for about half of a month, suddenly it not work, the current cube is not the biggest one. I am wondering where should I look, any help is appreciated. The traceback from log: ``` 2017-12-15 06:46:57,892 ERROR [Scheduler 2090031901 Job c9067736-eac7-48ad-88f3-dbd6f4e870ae-167] execution.ExecutableManager:149 : fail to get job output:c9067736-eac7-48ad-88f3-dbd6f4e870ae-14 org.apache.kylin.job.exception.PersistentException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions: Fri Dec 15 14:46:57 GMT+08:00 2017, RpcRetryingCaller{globalStartTime=1513320412890, pause=100, retries=1}, java.io.IOException: Call to ip-172-31-5-71.cn-north-1.compute.internal/172.31.5.71:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904, waitTime=5001, operationTimeout=5000 expired. at org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:202) at org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:145) at org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:312) at org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:392) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:149) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions: Fri Dec 15 14:46:57 GMT+08:00 2017, RpcRetryingCaller{globalStartTime=1513320412890, pause=100, retries=1}, java.io.IOException: Call to ip-172-31-5-71.cn-north-1.compute.internal/172.31.5.71:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904, waitTime=5001, operationTimeout=5000 expired. ```
