Sounds like a problem of Resource Manager (RM) of YARN, check the Heap size
for RM
Kylin loose connectivity whit RM

2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:

> hello,kylin community!
>
> sometimes my jobs stop accidenttly.It is can stop by any step.
>
> kylin log is like :
> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
> hbase.HBaseResourceStore:262 : Update row 
> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
> mapred.ClientServiceDelegate:273 : Application state is completed.
> FinalApplicationStatus=KILLED. Redirecting to job history server
> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>
> CM log is like:
> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
> User Name: tmn
> Queue: root.tmn
> State: KILLED
> Uberized: false
> Submitted: Sun Feb 12 19:19:24 CST 2017
> Started: Sun Feb 12 19:19:38 CST 2017
> Finished: Sun Feb 12 20:30:13 CST 2017
> Elapsed: 1hrs, 10mins, 35sec
> Diagnostics:
> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
> 10.180.212.38
> Job received Kill while in RUNNING state.
> Average Map Time 24mins, 48sec
>
> mapreduce job log
> Task KILL is received. Killing attempt!
>
> and when this happened ,by resume job,the job can resume success! I mean
>  it is not stop by error!
>
> what's the problem?
>
> My hadoop cluster is very busy,this situation happens very often.
>
> can I set retry time and retry  Interval?
>

Reply via email to