[
https://issues.apache.org/jira/browse/YARN-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931151#comment-13931151
]
Vinod Kumar Vavilapalli commented on YARN-1800:
-----------------------------------------------
Catching Throwable vs only specific exceptions is a balance between keeping
production up and finding bugs by crashing NodeManagers when such a bug
happens. For this ticket, I'm leaning towards the later.
The patch looks good. The test fails without the patch and passes with. Notably
the dispatcher-exit-on-error isn't necessary, but it's okay.
+1, checking this in for now.
> YARN NodeManager with java.util.concurrent.RejectedExecutionException
> ---------------------------------------------------------------------
>
> Key: YARN-1800
> URL: https://issues.apache.org/jira/browse/YARN-1800
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Paul Isaychuk
> Assignee: Varun Vasudev
> Priority: Critical
> Attachments: apache-yarn-1800.0.patch, apache-yarn-1800.1.patch,
> yarn-yarn-nodemanager-host-2.log.zip
>
>
> Noticed this on tests running on Apache Hadoop 2.2 cluster
> {code}
> 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource
> (LocalizedResource.java:handle(196)) - Resource
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource
> (LocalizedResource.java:handle(196)) - Resource
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.splitmetainfo
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource
> (LocalizedResource.java:handle(196)) - Resource
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.split
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource
> (LocalizedResource.java:handle(196)) - Resource
> hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.xml
> transitioned from INIT to DOWNLOADING
> 2014-01-23 01:30:28,576 INFO localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:addResource(651)) - Downloading public
> rsrc:{
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
> 1390440627435, FILE, null }
> 2014-01-23 01:30:28,576 FATAL event.AsyncDispatcher
> (AsyncDispatcher.java:dispatch(141)) - Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
> at
> java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:678)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:583)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:525)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-23 01:30:28,577 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:dispatch(144)) - Exiting, bbye..
> 2014-01-23 01:30:28,596 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped
> [email protected]:50060
> 2014-01-23 01:30:28,597 INFO containermanager.ContainerManagerImpl
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(328)) -
> Applications still running : [application_1389742077466_0396]
> 2014-01-23 01:30:28,597 INFO containermanager.ContainerManagerImpl
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(336)) - Wa
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)