It looks like a network problem in Hadoop cluster. Check the health status and then retry.
2017-10-10 12:57 GMT+08:00 s丶影中人* <[email protected]>: > When the amount of data is large, it will be reported ERROR。 How can i > solve it ? Log as follows: > > 2017-10-10 12:40:57,539 INFO [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > ipc.Client:898 : Retrying connect to server: mesos1.com/10.142.20.62:17688. > Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, > sleepTime=1000 MILLISECONDS) > 2017-10-10 12:40:57,539 WARN [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > ipc.Client:886 : Failed to connect to server: > mesos1.com/10.142.20.62:17688: retries get failed due to exceeded maximum > allowed retries number: 3 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at org.apache.hadoop.net.SocketIOWithTimeout.connect( > SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > at org.apache.hadoop.ipc.Client$Connection.setupConnection( > Client.java:648) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744) > at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557) > at org.apache.hadoop.ipc.Client.call(Client.java:1480) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker. > invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy101.getJobReport(Unknown Source) > at org.apache.hadoop.mapreduce.v2.api.impl.pb.client. > MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl. > java:133) > at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.mapred.ClientServiceDelegate.invoke( > ClientServiceDelegate.java:324) > at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus( > ClientServiceDelegate.java:423) > at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:698) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1917) > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323) > at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:341) > at org.apache.kylin.engine.mr.common.HadoopJobStatusChecker.checkStatus( > HadoopJobStatusChecker.java:38) > at org.apache.kylin.engine.mr.common.MapReduceExecutable. > doWork(MapReduceExecutable.java:152) > at org.apache.kylin.job.execution.AbstractExecutable. > execute(AbstractExecutable.java:125) > at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork( > DefaultChainedExecutable.java:65) > at org.apache.kylin.job.execution.AbstractExecutable. > execute(AbstractExecutable.java:125) > at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run( > DefaultScheduler.java:141) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-10-10 12:40:57,644 INFO [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > mapred.ClientServiceDelegate:277 : Application state is completed. > FinalApplicationStatus=FAILED. Redirecting to job history server > 2017-10-10 12:40:57,652 WARN [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > common.HadoopCmdOutput:92 : no counters for job job_1507428331796_0728 > 2017-10-10 12:40:57,657 INFO [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > execution.ExecutableManager:425 : job > id:5beb660a-3b2f-4697-a5ac-07ba826d2808-02 > from RUNNING to ERROR > 2017-10-10 12:40:57,663 INFO [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > execution.ExecutableManager:425 : job id:5beb660a-3b2f-4697-a5ac-07ba826d2808 > from RUNNING to ERROR > 2017-10-10 12:40:57,663 DEBUG [Job 5beb660a-3b2f-4697-a5ac-07ba826d2808-860] > execution.AbstractExecutable:259 : no need to send email, user list is > empty > 2017-10-10 12:40:57,677 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:41:20,927 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:41:50,928 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:42:20,926 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:42:50,926 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:43:20,926 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:43:50,927 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:44:12,055 DEBUG [http-bio-7070-exec-9] > badquery.BadQueryHistoryManager:90 > : Loaded 1 Bad Query(s) > 2017-10-10 12:44:20,927 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:44:50,930 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:45:20,928 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > 2017-10-10 12:45:50,928 INFO [pool-8-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 45 already succeed, 31 error, 0 discarded, 0 > others > > -- Best regards, Shaofeng Shi 史少锋
