Xuan Gong commented on YARN-2261:

Thanks for the comments. Steve.

bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore 
max...I'd advocate less mempory, but if pmem limits are turned on that's 

bq. would there be any actual/best effort offerings of the interval between AM 
termination and clean up scheduling?

I thought about this. 
* request the resource for clean-up container separately after the application 
is finished/failed/killed. In this case, the clean-up container can has its own 
resource requirement. As vinod's comment,  Cleanup container may not get 
resources because cluster may have gotten busy after the final AM exit.
* request the resource for the clean-up container at the same time when we 
request resource for AM container. And we can reserve the resource for the 
clean-up container, after the final AM exists, we use this reserved resource to 
launch the clean-up container.  In this case, the clean-up container can has 
its own resource requirement. But this option is not ideal. Because AM does not 
know whether it is the final. Even the RM does not know whether the current 
attempt is the final or not. RM only knows whether the previous attempt is 
final when it decides whether need to launch the next attempt. So, we need to 
request the resource for clean-up container every-time when we request resource 
for AM container. If current AM container is not the final, we will waste the 
* reuse the AM container resource as I proposed. If we have the feature (resize 
the container resource) ready, we could definitely let clean-up container has 
its own resource requirement.

Those are all the options that I can think for clean-up container scheduling, 
and that is why I propose that we can just reuse the AM container resource.

bq. My token concern is related to long lived apps: what tokens will they get/?

Currently, we could just give all the latest tokens which the AM has. I 
understand that for LRS apps, this is not enough. But i think that AM has the 
similar issue for the token renew/token update issue, we could fix those 

bq. How does this mix up with pre-emption?

This is a good point. The resource for clean-up container still belongs to the 
application's resource. I think that we could do:
* if the container is clean-up container, we can not pre-empt it
* if the clean-up container is pre-empted, we can just simply stop the clean-up 
process without retry, and mark as clean-up failure.

> YARN should have a way to run post-application cleanup
> ------------------------------------------------------
>                 Key: YARN-2261
>                 URL: https://issues.apache.org/jira/browse/YARN-2261
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
> See MAPREDUCE-5956 for context. Specific options are at 
> https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.

This message was sent by Atlassian JIRA

Reply via email to