[ 
https://issues.apache.org/jira/browse/YARN-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103700#comment-14103700
 ] 

Beckham007 commented on YARN-1800:
----------------------------------

[~vinodkv] [~jlowe] [~vvasudev] I think we shouldn't catch this exception. As 
[~jlowe] mentioned,"NM will be running in a damaged state where every public 
localization will fail the container. " Mostly, those container will failed. 
But the cpu/memory are free, other container would assigned to the NM. The new 
container would alse failed. This would decrease throughput of whole cluster. 
Maybe Let NM crashing would be a good choice.

> YARN NodeManager with java.util.concurrent.RejectedExecutionException
> ---------------------------------------------------------------------
>
>                 Key: YARN-1800
>                 URL: https://issues.apache.org/jira/browse/YARN-1800
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Paul Isaychuk
>            Assignee: Varun Vasudev
>            Priority: Critical
>             Fix For: 2.4.0
>
>         Attachments: apache-yarn-1800.0.patch, apache-yarn-1800.1.patch, 
> yarn-yarn-nodemanager-host-2.log.zip
>
>
> Noticed this on tests running on Apache Hadoop 2.2 cluster
> {code}
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar
>  transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.splitmetainfo
>  transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.split 
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.xml 
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,576 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:addResource(651)) - Downloading public 
> rsrc:{ 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
>  1390440627435, FILE, null }
> 2014-01-23 01:30:28,576 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(141)) - Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException
>         at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>         at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>         at 
> java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:678)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:583)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:525)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-23 01:30:28,577 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(144)) - Exiting, bbye..
> 2014-01-23 01:30:28,596 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@0.0.0.0:50060
> 2014-01-23 01:30:28,597 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(328)) - 
> Applications still running : [application_1389742077466_0396]
> 2014-01-23 01:30:28,597 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(336)) - Wa
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to