[
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020191#comment-15020191
]
Naganarasimha G R commented on YARN-3784:
-----------------------------------------
Thanks [~sunilg], for updating offline on this issue, basically i was missing
the part that in FiCaSchedulerApp you are subtracting the elapsed time and
hence you were able send the effective timeout to the AM during Hearbeat, but i
can see following issues with current approach than having probable timestamp
(current time + preemption timeout) during the creation of PreemptionContainer
and share this to AM
* There can be a small delta between actual timeout value and the time when it
can actually timeout
* some additional loops during creation of response during heartbeat response
(though not a thing of high performance impact but nevertheless can be avoided
)
* Avoid additional storage of {{containersWithFirstNotifyTime}} in
FiCaSchedulerApp
But may current approach is more simpler for users to understand a numerical
value than a timestamp!, thoughts from others ?
Also few issues with the patch :
* possible leak in {{containersWithFirstNotifyTime}} as remove is not being
called?
* can there be a case where {{containersWithFirstNotifyTime}} be not filled in
for a preempted container ? if not i feel additional if check {{if
(containersWithFirstNotifyTime.containsKey(c))}} in the for loop is not
required.
> Indicate preemption timout along with the list of containers to AM
> (preemption message)
> ---------------------------------------------------------------------------------------
>
> Key: YARN-3784
> URL: https://issues.apache.org/jira/browse/YARN-3784
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Sunil G
> Assignee: Sunil G
> Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch,
> 0003-YARN-3784.patch, 0004-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which
> are marked for preemption. Introducing a timeout duration also along with
> this container list so that AM can know how much time it will get to do a
> graceful shutdown to its containers (assuming one of preemption policy is
> loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be
> decommissioned after a timeout (also killing containers on it). This timeout
> will be helpful to indicate AM that those containers can be killed by RM
> forcefully after the timeout.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)