[ 
https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007779#comment-14007779
 ] 

Vinod Kumar Vavilapalli edited comment on YARN-2095 at 5/23/14 11:13 PM:
-------------------------------------------------------------------------

Vinod, could you read the email below. Would you agree that there should be a 
log entry from Yarn in this case?

Clay,

What I noticed is that your reducers were overloaded and were on the brink of 
running out of memory. The Java heaps were running at 99% and continuously 
GC’ing while the app was reading from disk. So it was trying it’s best to 
process the job with limited resources. I agree with you that it would be 
helpful if the container could put out a log message that there was GC issues 
to help with debugging.

Thanks,


was (Author: [email protected]):
Vinod, could you read Eric's email below. Would you agree that there should be 
a log entry from Yarn in this case?

Clay McDonald 
Cell: 202.560.4101 
Direct: 202.747.5962 


From: Eric Mizell [mailto:[email protected]] 
Sent: Friday, May 23, 2014 4:18 PM
To: Clay McDonald
Subject: Re: [jira] [Created] (YARN-2095) Large MapReduce Job stops responding

Clay,

What I noticed is that your reducers were overloaded and were on the brink of 
running out of memory. The Java heaps were running at 99% and continuously 
GC’ing while the app was reading from disk. So it was trying it’s best to 
process the job with limited resources. I agree with you that it would be 
helpful if the container could put out a log message that there was GC issues 
to help with debugging.

Thanks,

Eric Mizell  Director Solution Engineering, Hortonworks
Mobile: 678-761-7623
Email: [email protected]
Website: http://www.hortonworks.com/





> Large MapReduce Job stops responding
> ------------------------------------
>
>                 Key: YARN-2095
>                 URL: https://issues.apache.org/jira/browse/YARN-2095
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6
>            Reporter: Clay McDonald
>            Priority: Blocker
>
> Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but 
> logging to container logs stop after running 33 hours. The job appears to be 
> hung. The status of the job is "RUNNING". No error messages found in logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to