Arun Suresh created YARN-7086:
---------------------------------
Summary: Release all containers aynchronously
Key: YARN-7086
URL: https://issues.apache.org/jira/browse/YARN-7086
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Arun Suresh
Assignee: Arun Suresh
We have noticed in production two situations that can cause deadlocks and cause
scheduling of new containers to come to a halt, especially with regard to
applications that have a lot of live containers:
# When these applicaitons release these containers in bulk.
# When these applications terminate abruptly due to some failure, the scheduler
releases all its live containers in a loop.
To handle the issues mentioned above, we have a patch in production to make
sure ALL container releases happen asynchronously - and it has served us well.
Opening this JIRA to gather feedback on if this is a good idea generally (cc
[~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
BTW, In YARN-6251, we already have an asyncReleaseContainer() in the
AbstractYarnScheduler and a corresponding scheduler event, which is currently
used specifically for the container-update code paths (where the scheduler
realeases temp containers which it creates for the update)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]