It looks like a network problem in Hadoop cluster. Check the health status
and then retry.

2017-10-10 12:57 GMT+08:00 s丶影中人* <[email protected]>:

> When the amount of data is large, it will be reported ERROR。 How can i
> solve it ? Log as follows:
>
> 2017-10-10 12:40:57,539 INFO  [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> ipc.Client:898 : Retrying connect to server: mesos1.com/10.142.20.62:17688.
> Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-10-10 12:40:57,539 WARN  [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> ipc.Client:886 : Failed to connect to server:
> mesos1.com/10.142.20.62:17688: retries get failed due to exceeded maximum
> allowed retries number: 3
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(
> SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at org.apache.hadoop.ipc.Client$Connection.setupConnection(
> Client.java:648)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
> at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
> at org.apache.hadoop.ipc.Client.call(Client.java:1480)
> at org.apache.hadoop.ipc.Client.call(Client.java:1441)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.
> invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy101.getJobReport(Unknown Source)
> at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.
> MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.
> java:133)
> at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(
> ClientServiceDelegate.java:324)
> at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(
> ClientServiceDelegate.java:423)
> at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:698)
> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)
> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1917)
> at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
> at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:341)
> at org.apache.kylin.engine.mr.common.HadoopJobStatusChecker.checkStatus(
> HadoopJobStatusChecker.java:38)
> at org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:152)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:125)
> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:65)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:125)
> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:141)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2017-10-10 12:40:57,644 INFO  [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> mapred.ClientServiceDelegate:277 : Application state is completed.
> FinalApplicationStatus=FAILED. Redirecting to job history server
> 2017-10-10 12:40:57,652 WARN  [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> common.HadoopCmdOutput:92 : no counters for job job_1507428331796_0728
> 2017-10-10 12:40:57,657 INFO  [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> execution.ExecutableManager:425 : job 
> id:5beb660a-3b2f-4697-a5ac-07ba826d2808-02
> from RUNNING to ERROR
> 2017-10-10 12:40:57,663 INFO  [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> execution.ExecutableManager:425 : job id:5beb660a-3b2f-4697-a5ac-07ba826d2808
> from RUNNING to ERROR
> 2017-10-10 12:40:57,663 DEBUG [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860]
> execution.AbstractExecutable:259 : no need to send email, user list is
> empty
> 2017-10-10 12:40:57,677 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:41:20,927 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:41:50,928 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:42:20,926 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:42:50,926 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:43:20,926 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:43:50,927 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:44:12,055 DEBUG [http-bio-7070-exec-9] 
> badquery.BadQueryHistoryManager:90
> : Loaded 1 Bad Query(s)
> 2017-10-10 12:44:20,927 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:44:50,930 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:45:20,928 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
> 2017-10-10 12:45:50,928 INFO  [pool-8-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0
> others
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to