Xuan Gong commented on YARN-2261:
Thanks for the comments. Steve.
bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore
max...I'd advocate less mempory, but if pmem limits are turned on that's
bq. would there be any actual/best effort offerings of the interval between AM
termination and clean up scheduling?
I thought about this.
* request the resource for clean-up container separately after the application
is finished/failed/killed. In this case, the clean-up container can has its own
resource requirement. As vinod's comment, Cleanup container may not get
resources because cluster may have gotten busy after the final AM exit.
* request the resource for the clean-up container at the same time when we
request resource for AM container. And we can reserve the resource for the
clean-up container, after the final AM exists, we use this reserved resource to
launch the clean-up container. In this case, the clean-up container can has
its own resource requirement. But this option is not ideal. Because AM does not
know whether it is the final. Even the RM does not know whether the current
attempt is the final or not. RM only knows whether the previous attempt is
final when it decides whether need to launch the next attempt. So, we need to
request the resource for clean-up container every-time when we request resource
for AM container. If current AM container is not the final, we will waste the
* reuse the AM container resource as I proposed. If we have the feature (resize
the container resource) ready, we could definitely let clean-up container has
its own resource requirement.
Those are all the options that I can think for clean-up container scheduling,
and that is why I propose that we can just reuse the AM container resource.
bq. My token concern is related to long lived apps: what tokens will they get/?
Currently, we could just give all the latest tokens which the AM has. I
understand that for LRS apps, this is not enough. But i think that AM has the
similar issue for the token renew/token update issue, we could fix those
bq. How does this mix up with pre-emption?
This is a good point. The resource for clean-up container still belongs to the
application's resource. I think that we could do:
* if the container is clean-up container, we can not pre-empt it
* if the clean-up container is pre-empted, we can just simply stop the clean-up
process without retry, and mark as clean-up failure.
> YARN should have a way to run post-application cleanup
> Key: YARN-2261
> URL: https://issues.apache.org/jira/browse/YARN-2261
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> See MAPREDUCE-5956 for context. Specific options are at
This message was sent by Atlassian JIRA