when it goes down there is a notification with errors like so: Error Log: java.lang.RuntimeException: org.apache.kylin.job.exception.PersistentException: java.net.SocketTimeoutException: callTimeout=60000, callDuration=69049: row '/execute_output/28724e3b-72e4-4ab0-ab06-4ab6aad3d7bb' on table 'kylin_metadata' at region=kylin_metadata,,1467291011009.2cc83fc3fb51700a8a9884e5c5401e20., hostname=ip-10-74-61-15.ec2.internal,16020,1523374325020, seqNum=399271 at org.apache.kylin.job.manager.ExecutableManager.getOutput(ExecutableManager.java:128) at org.apache.kylin.job.execution.AbstractExecutable.getStatus(AbstractExecutable.java:193) at org.apache.kylin.job.execution.AbstractExecutable.toString(AbstractExecutable.java:379) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:115) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.kylin.job.exception.PersistentException: java.net.SocketTimeoutException: callTimeout=60000, callDuration=69049: row '/execute_output/28724e3b-72e4-4ab0-ab06-4ab6aad3d7bb' on table 'kylin_metadata' at region=kylin_metadata,,1467291011009.2cc83fc3fb51700a8a9884e5c5401e20., hostname=ip-10-74-61-15.ec2.internal,16020,1523374325020, seqNum=399271 at org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:195) at org.apache.kylin.job.manager.ExecutableManager.getOutput(ExecutableManager.java:123) ... 7 more Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=69049: row '/execute_output/28724e3b-72e4-4ab0-ab06-4ab6aad3d7bb' on table 'kylin_metadata' at region=kylin_metadata,,1467291011009.2cc83fc3fb51700a8a9884e5c5401e20., hostname=ip-10-74-61-15.ec2.internal,16020,1523374325020, seqNum=399271 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:889) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:855) at org.apache.kylin.storage.hbase.HBaseResourceStore.internalGetFromHTable(HBaseResourceStore.java:332) at org.apache.kylin.storage.hbase.HBaseResourceStore.getFromHTable(HBaseResourceStore.java:312) at org.apache.kylin.storage.hbase.HBaseResourceStore.getResourceImpl(HBaseResourceStore.java:224) at org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:140) at org.apache.kylin.job.dao.ExecutableDao.readJobOutputResource(ExecutableDao.java:93) at org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:186) ... 8 more Caused by: java.io.IOException: Call to ip-10-74-61-15.ec2.internal/ 10.74.61.15:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=713832, waitTime=60001, operationTimeout=60000 expired. at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:32627) at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:881) at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:872) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) ... 16 more Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=713832, waitTime=60001, operationTimeout=60000 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204) ... 22 more
On Tue, Apr 17, 2018 at 10:44 AM, Sonny Heer <[email protected]> wrote: > OK it does move to another RegionServer. we're doing more testing, but it > appears DN that hosts the kylin_metadata goes down sometimes. Sometimes > the same job succeeds... > > On Tue, Apr 17, 2018 at 10:36 AM, Sonny Heer <[email protected]> wrote: > >> Not sure if this is normal or not, but I see kylin metadata is on a >> single region server (DN & RS on node). >> >> if this datanode goes down... it appears kylin isn't able to pull jobs >> for monitor or complete jobs? >> >> >> hbase requests: >> >> kylin_metadata,,1467291011009.2cc83fc3fb51700a8a9884e5c5401e20. 5534555 >> 49271 >> >> ^^ >> >> >
