[jira] [Commented] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2019-01-18 Thread Feng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746053#comment-16746053
 ] 

Feng Yuan commented on YARN-6607:
-

Hi [~djp] can you give some advise about the patch?

> YARN Resource Manager quits with the exception 
> java.util.concurrent.RejectedExecutionException: 
> 
>
> Key: YARN-6607
> URL: https://issues.apache.org/jira/browse/YARN-6607
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anandhaprabhu
>Priority: Major
> Attachments: YARN-6607.branch-2.7.1
>
>
> ResourceManager goes down frequently with the below exception
> 2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@9efeac9 rejected from 
> java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
> active threads = 0, queued tasks = 0, completed tasks = 223337]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
> at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> at 
> org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:151)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:177)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:276)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-16 03:32:36,898 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) 
> - EventThread shut down
> 2017-05-16 03:32:36,898 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) 
> - Session: 0x15b8703e986b750 closed
> 2017-05-16 03:32:36,898 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(623)) - completedContainer queue=high 
> usedCapacity=0.41496983 absoluteUsedCapacity=0.29047886 used= vCores:847> cluster=
> 2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
> root.high.lawful stats: lawful: capacity=0.3, absoluteCapacity=0.2101, 
> usedResources=, usedCapacity=0.16657583, 
> absoluteUsedCapacity=0.034980923, numApps=19, numContainers=102
> 2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(623)) - completedContainer queue=root 
> usedCapacity=0.41565567 absoluteUsedCapacity=0.41565567 used= vCores:1212> cluster=
> 2017-05-16 03:32:36,906 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
> root.high stats: high: numChildQueue= 4, capacity=0.7, absoluteCapacity=0.7, 
> usedResources=usedCapacity=0.41496983, 
> numApps=61, numContainers=847
> 2017-05-16 03:32:36,906 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(1562)) - Application attempt 
> appattempt_1494886223429_7023_01 released container 
> container_e43_1494886223429_7023_01_43 on node: host: 
> r13d8.hadoop.log10.blackberry:45454 #containers=1 available= vCores:23> used= with event: FINISHED



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2019-01-18 Thread Feng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745946#comment-16745946
 ] 

Feng Yuan commented on YARN-6607:
-

hi [~gouri shankar] i am sorry to replay so long, this issue was lost with my 
trivial work in past two years.
if is because ThreadPool quota meet, i submit a patch on top.
but i still suspect there is any other FATAL problem.
like: XXX happen -> service stop -> threadpool stop -> executor.submit 
return reject.
so is there XXX exist.
holp your words.


> YARN Resource Manager quits with the exception 
> java.util.concurrent.RejectedExecutionException: 
> 
>
> Key: YARN-6607
> URL: https://issues.apache.org/jira/browse/YARN-6607
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anandhaprabhu
>Priority: Major
>
> ResourceManager goes down frequently with the below exception
> 2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@9efeac9 rejected from 
> java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
> active threads = 0, queued tasks = 0, completed tasks = 223337]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
> at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> at 
> org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:151)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:177)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:276)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-16 03:32:36,898 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) 
> - EventThread shut down
> 2017-05-16 03:32:36,898 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) 
> - Session: 0x15b8703e986b750 closed
> 2017-05-16 03:32:36,898 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(623)) - completedContainer queue=high 
> usedCapacity=0.41496983 absoluteUsedCapacity=0.29047886 used= vCores:847> cluster=
> 2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
> root.high.lawful stats: lawful: capacity=0.3, absoluteCapacity=0.2101, 
> usedResources=, usedCapacity=0.16657583, 
> absoluteUsedCapacity=0.034980923, numApps=19, numContainers=102
> 2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(623)) - completedContainer queue=root 
> usedCapacity=0.41565567 absoluteUsedCapacity=0.41565567 used= vCores:1212> cluster=
> 2017-05-16 03:32:36,906 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
> root.high stats: high: numChildQueue= 4, capacity=0.7, absoluteCapacity=0.7, 
> usedResources=usedCapacity=0.41496983, 
> numApps=61, numContainers=847
> 2017-05-16 03:32:36,906 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(1562)) - Application attempt 
> appattempt_1494886223429_7023_01 released container 
> container_e43_1494886223429_7023_01_43 on node: host: 
> r13d8.hadoop.log10.blackberry:45454 #containers=1 available= vCores:23> used= with event: FINISHED



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2018-03-20 Thread gouri shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405907#comment-16405907
 ] 

gouri shankar commented on YARN-6607:
-

Stack TraceĀ 

FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(190)) - Error in 
dispatcher thread
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@75755d80 rejected from 
java.util.concurrent.ThreadPoolExecutor@5a46169[Terminated, pool size = 0, 
active threads = 0, queued tasks = 0, completed tasks = 80564]
 at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
 at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
 at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
 at 
org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
 at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
 at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
 at 
org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
 at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:156)
 at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:188)
 at 
org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:182)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:279)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
 at java.lang.Thread.run(Thread.java:745)
2018-03-20 06:27:52,787 INFO resourcemanager.RMAuditLogger 
(RMAuditLogger.java:logSuccess(141)) - USER=svc-tap-prod OPERATION=AM Released 
Container TARGET=SchedulerApp RESULT=SUCCESS 
APPID=application_1521520492173_0200 
CONTAINERID=container_e335_1521520492173_0200_01_001733
2018-03-20 06:27:52,851 INFO scheduler.SchedulerNode 
(SchedulerNode.java:releaseContainer(220)) - Released container 
container_e335_1521520492173_0200_01_001733 of capacity  on host pphdpworkr119xx.global.tesco.org:45454, which currently has 
17 containers,  used and  
available, release resources=true
2018-03-20 06:27:52,851 INFO zookeeper.ClientCnxn 
(ClientCnxn.java:primeConnection(864)) - Socket connection established, 
initiating session, client: /172.17.136.13:44794, server: 
pphdpmanag002xx.global.tesco.org/172.17.136.12:2181
2018-03-20 06:27:52,819 INFO zookeeper.ClientCnxn 
(ClientCnxn.java:primeConnection(864)) - Socket connection established, 
initiating session, client: /172.17.136.13:50905, server: 
pphdpmanag003xx.global.tesco.org/172.17.136.13:2181
2018-03-20 06:27:52,882 INFO event.AsyncDispatcher 
(AsyncDispatcher.java:serviceStop(142)) - AsyncDispatcher is draining to stop, 
igonring any new events.
2018-03-20 06:27:52,882 INFO zookeeper.ClientCnxn 
(ClientCnxn.java:onConnected(1279)) - Session establishment complete on server 
pphdpmanag003xx.global.tesco.org/172.17.136.13:2181, sessionid = 
0x2622a2c989a06e8, negotiated timeout = 1
2018-03-20 06:27:52,882 INFO zookeeper.ClientCnxn 
(ClientCnxn.java:onConnected(1279)) - Session establishment complete on server 
pphdpmanag002xx.global.tesco.org/172.17.136.12:2181, sessionid = 
0x1622cd06bfd05b0, negotiated timeout = 1
2018-03-20 06:27:52,883 INFO cloud.ConnectionManager 
(ConnectionManager.java:process(104)) - Watcher 
org.apache.solr.common.cloud.ConnectionManager@3882ddf4 
name:ZooKeeperConnection 
Watcher:pphdpmanag002xx.global.tesco.org:2181,pphdpmanag003xx.global.tesco.org:2181,pphdpmanag004xx.global.tesco.org:2181/infra-solr
 got event WatchedEvent state:SyncConnected type:None path:null path:null 
type:None
2018-03-20 06:27:52,945 INFO cloud.ConnectionManager 
(ConnectionManager.java:waitForConnected(230)) - Client is connected to 
ZooKeeper
2018-03-20 06:27:52,945 INFO util.AbstractLivelinessMonitor 
(AbstractLivelinessMonitor.java:run(139)) - AMLivelinessMonitor thread 
interrupted
2018-03-20 06:27:52,945 INFO util.AbstractLivelinessMonitor 
(AbstractLivelinessMonitor.java:run(139)) - AMLivelinessMonitor thread 
interrupted
2018-03-20 06:27:52,946 ERROR delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(659)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted
2018-03-20 

[jira] [Commented] (YARN-6607) YARN Resource Manager quits with the exception java.util.concurrent.RejectedExecutionException:

2017-05-23 Thread Feng Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1605#comment-1605
 ] 

Feng Yuan commented on YARN-6607:
-

Is there any stacktrace?

> YARN Resource Manager quits with the exception 
> java.util.concurrent.RejectedExecutionException: 
> 
>
> Key: YARN-6607
> URL: https://issues.apache.org/jira/browse/YARN-6607
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anandhaprabhu
>
> ResourceManager goes down frequently with the below exception
> 2017-05-16 03:32:36,897 FATAL event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@9efeac9 rejected from 
> java.util.concurrent.ThreadPoolExecutor@42ab30[Shutting down, pool size = 16, 
> active threads = 0, queued tasks = 0, completed tasks = 223337]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
> at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> at 
> org.apache.hadoop.registry.server.services.RegistryAdminService.submit(RegistryAdminService.java:176)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:200)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.purgeRecordsAsync(RMRegistryOperationsService.java:170)
> at 
> org.apache.hadoop.registry.server.integration.RMRegistryOperationsService.onContainerFinished(RMRegistryOperationsService.java:146)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService.handleAppAttemptEvent(RMRegistryService.java:151)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.registry.RMRegistryService$AppEventHandler.handle(RMRegistryService.java:177)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:276)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-16 03:32:36,898 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) 
> - EventThread shut down
> 2017-05-16 03:32:36,898 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) 
> - Session: 0x15b8703e986b750 closed
> 2017-05-16 03:32:36,898 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(623)) - completedContainer queue=high 
> usedCapacity=0.41496983 absoluteUsedCapacity=0.29047886 used= vCores:847> cluster=
> 2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
> root.high.lawful stats: lawful: capacity=0.3, absoluteCapacity=0.2101, 
> usedResources=, usedCapacity=0.16657583, 
> absoluteUsedCapacity=0.034980923, numApps=19, numContainers=102
> 2017-05-16 03:32:36,905 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(623)) - completedContainer queue=root 
> usedCapacity=0.41565567 absoluteUsedCapacity=0.41565567 used= vCores:1212> cluster=
> 2017-05-16 03:32:36,906 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(640)) - Re-sorting completed queue: 
> root.high stats: high: numChildQueue= 4, capacity=0.7, absoluteCapacity=0.7, 
> usedResources=usedCapacity=0.41496983, 
> numApps=61, numContainers=847
> 2017-05-16 03:32:36,906 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(1562)) - Application attempt 
> appattempt_1494886223429_7023_01 released container 
> container_e43_1494886223429_7023_01_43 on node: host: 
> r13d8.hadoop.log10.blackberry:45454 #containers=1 available= vCores:23> used= with event: FINISHED



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org