Hi Vineet,

Have you looked into /etc/nsswitch.conf and /etc/resolv.conf if across all
hosts are consistent? I've seen intermittent issues with MR jobs when the
'hosts:' entry in /etc/nsswitch.conf points to dns first and there is no
nscd daemon running or the DNS server is flakey and the host name cannot be
resolved.

cheers,
esteban.

--
Cloudera, Inc.


On Wed, Jul 15, 2015 at 5:16 AM, Vineet Mishra <[email protected]>
wrote:

> Hi All,
>
> I am facing a strange issue, I am running a Hbase Bulk Load to load a Hfile
> to my hbase table, while running the same I am landing into the same issue
> over and over again.
>
> java.io.IOException: BulkLoad encountered an unrecoverable problem
> at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:381)
> at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:310)
> at
>
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:896)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.kylin.job.hadoop.hbase.BulkLoadJob.run(BulkLoadJob.java:83)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
>
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
>
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
>
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after attempts=35, exceptions:
> Tue Jul 14 23:18:48 PDT 2015,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@216b1af0,
> java.net.UnknownHostException: unknown host: prod-hadoop-data02
> Tue Jul 14 23:18:49 PDT 2015,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@216b1af0,
> java.net.UnknownHostException: unknown host: prod-hadoop-data02
>
> So there are multiple jobs which are initiating table load process out of
> which one of the job is failing intermittently, let me clarify the failing
> job is not the same everytime.
> So for instance last time my job 1 got failed but today its job 2, but out
> of all the exception remains the same.
>
> I am having the respective host entry on the all my hosts hadoop-yarn 10
> node cluster(10 data node).
>
> Prominent suggestion are appreciated.
>
> Thanks!
>

Reply via email to