[ 
https://issues.apache.org/jira/browse/YARN-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931151#comment-13931151
 ] 

Vinod Kumar Vavilapalli commented on YARN-1800:
-----------------------------------------------

Catching Throwable vs only specific exceptions is a balance between keeping 
production up and finding bugs by crashing NodeManagers when such a bug 
happens. For this ticket, I'm leaning towards the later.

The patch looks good. The test fails without the patch and passes with. Notably 
the dispatcher-exit-on-error isn't necessary, but it's okay.

+1, checking this in for now.

> YARN NodeManager with java.util.concurrent.RejectedExecutionException
> ---------------------------------------------------------------------
>
>                 Key: YARN-1800
>                 URL: https://issues.apache.org/jira/browse/YARN-1800
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Paul Isaychuk
>            Assignee: Varun Vasudev
>            Priority: Critical
>         Attachments: apache-yarn-1800.0.patch, apache-yarn-1800.1.patch, 
> yarn-yarn-nodemanager-host-2.log.zip
>
>
> Noticed this on tests running on Apache Hadoop 2.2 cluster
> {code}
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar
>  transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.splitmetainfo
>  transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.split 
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(196)) - Resource 
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.xml 
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,576 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:addResource(651)) - Downloading public 
> rsrc:{ 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
>  1390440627435, FILE, null }
> 2014-01-23 01:30:28,576 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(141)) - Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException
>         at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>         at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>         at 
> java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:678)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:583)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:525)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-23 01:30:28,577 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(144)) - Exiting, bbye..
> 2014-01-23 01:30:28,596 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> [email protected]:50060
> 2014-01-23 01:30:28,597 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(328)) - 
> Applications still running : [application_1389742077466_0396]
> 2014-01-23 01:30:28,597 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(336)) - Wa
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to