[
https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007779#comment-14007779
]
Vinod Kumar Vavilapalli edited comment on YARN-2095 at 5/23/14 11:13 PM:
-------------------------------------------------------------------------
Vinod, could you read the email below. Would you agree that there should be a
log entry from Yarn in this case?
Clay,
What I noticed is that your reducers were overloaded and were on the brink of
running out of memory. The Java heaps were running at 99% and continuously
GC’ing while the app was reading from disk. So it was trying it’s best to
process the job with limited resources. I agree with you that it would be
helpful if the container could put out a log message that there was GC issues
to help with debugging.
Thanks,
was (Author: [email protected]):
Vinod, could you read Eric's email below. Would you agree that there should be
a log entry from Yarn in this case?
Clay McDonald
Cell: 202.560.4101
Direct: 202.747.5962
From: Eric Mizell [mailto:[email protected]]
Sent: Friday, May 23, 2014 4:18 PM
To: Clay McDonald
Subject: Re: [jira] [Created] (YARN-2095) Large MapReduce Job stops responding
Clay,
What I noticed is that your reducers were overloaded and were on the brink of
running out of memory. The Java heaps were running at 99% and continuously
GC’ing while the app was reading from disk. So it was trying it’s best to
process the job with limited resources. I agree with you that it would be
helpful if the container could put out a log message that there was GC issues
to help with debugging.
Thanks,
Eric Mizell Director Solution Engineering, Hortonworks
Mobile: 678-761-7623
Email: [email protected]
Website: http://www.hortonworks.com/
> Large MapReduce Job stops responding
> ------------------------------------
>
> Key: YARN-2095
> URL: https://issues.apache.org/jira/browse/YARN-2095
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.2.0
> Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6
> Reporter: Clay McDonald
> Priority: Blocker
>
> Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but
> logging to container logs stop after running 33 hours. The job appears to be
> hung. The status of the job is "RUNNING". No error messages found in logs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)