Hi Vineet, Have you looked into /etc/nsswitch.conf and /etc/resolv.conf if across all hosts are consistent? I've seen intermittent issues with MR jobs when the 'hosts:' entry in /etc/nsswitch.conf points to dns first and there is no nscd daemon running or the DNS server is flakey and the host name cannot be resolved.
cheers, esteban. -- Cloudera, Inc. On Wed, Jul 15, 2015 at 5:16 AM, Vineet Mishra <[email protected]> wrote: > Hi All, > > I am facing a strange issue, I am running a Hbase Bulk Load to load a Hfile > to my hbase table, while running the same I am landing into the same issue > over and over again. > > java.io.IOException: BulkLoad encountered an unrecoverable problem > at > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:381) > at > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:310) > at > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:896) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.kylin.job.hadoop.hbase.BulkLoadJob.run(BulkLoadJob.java:83) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > > org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) > at > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > at > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > at > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > after attempts=35, exceptions: > Tue Jul 14 23:18:48 PDT 2015, > org.apache.hadoop.hbase.client.RpcRetryingCaller@216b1af0, > java.net.UnknownHostException: unknown host: prod-hadoop-data02 > Tue Jul 14 23:18:49 PDT 2015, > org.apache.hadoop.hbase.client.RpcRetryingCaller@216b1af0, > java.net.UnknownHostException: unknown host: prod-hadoop-data02 > > So there are multiple jobs which are initiating table load process out of > which one of the job is failing intermittently, let me clarify the failing > job is not the same everytime. > So for instance last time my job 1 got failed but today its job 2, but out > of all the exception remains the same. > > I am having the respective host entry on the all my hosts hadoop-yarn 10 > node cluster(10 data node). > > Prominent suggestion are appreciated. > > Thanks! >
