[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391872#comment-14391872 ] Vinod Kumar Vavilapalli commented on YARN-2261: --- MAPREDUCE-4099 originally facilitated this for MapReduce in a not so ideal way. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325548#comment-14325548 ] Bikas Saha commented on YARN-2261: -- Looks like AM preemption will not fail the AM and so the comments about AM preemption are probably not valid. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325547#comment-14325547 ] Bikas Saha commented on YARN-2261: -- Sounds reasonable. Though things like how does YARN expose this information to the user would eventually need to be thought about. Currently the RM page shows running applications. How will show cleanup in progress? Can the original AM be preempted due to lack of resources? In that case, how will we launch the clean up container. Though probably the same problem exists now because if an AM is preempted it would not be able to clean up. However, moving this responsibility from the user to YARN (as proposed in this jira) makes that a YARN problem to solve. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325472#comment-14325472 ] Xuan Gong commented on YARN-2261: - [~ste...@apache.org] [~bikassaha] [~vinodkv] any further comments for the proposal ? YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323321#comment-14323321 ] Xuan Gong commented on YARN-2261: - Thanks for the comments. Steve. bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore max...I'd advocate less mempory, but if pmem limits are turned on that's dangerous. bq. would there be any actual/best effort offerings of the interval between AM termination and clean up scheduling? I thought about this. * request the resource for clean-up container separately after the application is finished/failed/killed. In this case, the clean-up container can has its own resource requirement. As vinod's comment, Cleanup container may not get resources because cluster may have gotten busy after the final AM exit. * request the resource for the clean-up container at the same time when we request resource for AM container. And we can reserve the resource for the clean-up container, after the final AM exists, we use this reserved resource to launch the clean-up container. In this case, the clean-up container can has its own resource requirement. But this option is not ideal. Because AM does not know whether it is the final. Even the RM does not know whether the current attempt is the final or not. RM only knows whether the previous attempt is final when it decides whether need to launch the next attempt. So, we need to request the resource for clean-up container every-time when we request resource for AM container. If current AM container is not the final, we will waste the resource. * reuse the AM container resource as I proposed. If we have the feature (resize the container resource) ready, we could definitely let clean-up container has its own resource requirement. Those are all the options that I can think for clean-up container scheduling, and that is why I propose that we can just reuse the AM container resource. bq. My token concern is related to long lived apps: what tokens will they get/? Currently, we could just give all the latest tokens which the AM has. I understand that for LRS apps, this is not enough. But i think that AM has the similar issue for the token renew/token update issue, we could fix those together. bq. How does this mix up with pre-emption? This is a good point. The resource for clean-up container still belongs to the application's resource. I think that we could do: * if the container is clean-up container, we can not pre-empt it OR * if the clean-up container is pre-empted, we can just simply stop the clean-up process without retry, and mark as clean-up failure. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321376#comment-14321376 ] Steve Loughran commented on YARN-2261: -- # Maybe the cleanup containers could have lower limits on allocation: 1 vcore max...I'd advocate less mempory, but if pmem limits are turned on that's dangerous. # My token concern is related to long lived apps: what tokens will they get/? # How does this mix up with pre-emption? # would there be any actual/best effort offerings of the interval between AM termination and clean up scheduling? I can see the appeal of rerunning in the AM container; there's a special case of repeated AM failure where the cleanup code may still be needed. Perhaps some history could be passed in (env var = FS URL of history) so cleanup logic could be smarter. I think I'd really need to design a cleanup routine for slider to know what we'd actually need to run. Probably some of * YARN registry cleanup (unless YARN-2571 actually gets committed, that being a far simpler option) * HDFS cleanup (potentially) YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321149#comment-14321149 ] Xuan Gong commented on YARN-2261: - First of all, I think the users should know what and how the clean-up container need to do, and it is their responsibility to provide the hints to RM about how the clean-up container can be launched. For YARN, based on the provided hints, we should provide enough resource to launch the container and monitor it. For the clean-up container itself, it should at least have following properties: * Should be optional. * Have configurable time-out. We do not want the container take a long period time to do the clean up. * Can retry The most challenge part of this problem is how and when RM get the resource to launch this clean-up container. The proposal is when am container is finished, instead of releasing this container immediately, we could re-use this resource and launch the clean-up container. Similar as AMContainer, RM will launch this clean-up container. In that case, latest token information could be provided. [~bikassaha] [~ste...@apache.org] suggestions ? YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193324#comment-14193324 ] Steve Loughran commented on YARN-2261: -- One problem with this proposal as is that it doesn't address AM failure, especially for the final will not be restarted operation. If an AM could specify the cleanup routine to execute on cleanup (same resources as for the normal AM), this would be possible. For Slider we'd just add a new CLI option to the same entry point, something like {{-D slider.am.cleanup=true}} YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058745#comment-14058745 ] Robert Joseph Evans commented on YARN-2261: --- +1 either approach seems fine to me. Vinod's requires an opt in, which is nice from a backwards compatibility standpoint. Also do we want to count the cleanup container as a running application? We definitely need to count its resources against any queue it is a part of, but for a queue that is configured to run mostly large applications, it could have other applications back up behind the cleanup containers. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058979#comment-14058979 ] Bikas Saha commented on YARN-2261: -- The cleanup would be indistinguishable from an AM that is cleaning up post job completion (as it happens today). Specially, if we use the second approach (via AM cleanup mode) this would be virtually indistinguishable from what happens today. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059060#comment-14059060 ] Robert Joseph Evans commented on YARN-2261: --- Yes and that is not necessarily a good thing. Especially if cleanup can take a relatively long period of time. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059094#comment-14059094 ] Bikas Saha commented on YARN-2261: -- Would that be an existing issue that needs to be tracked separately? YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054571#comment-14054571 ] Vinod Kumar Vavilapalli commented on YARN-2261: --- The proposal here is to have a YARN application-level cleanup container that runs only as the last thing for an application in the cluster. - In a way, we already have this today, as we let AM's hang around for a while (by default - 10mins) *after* unregister - this feature makes it explicit. - For those who have lived in this space around for a while, this is akin to MR job-cleanup. - This feature lets apps submit a separate container-launch-context for cleanup, one that is only run after the app is done for real. - Clearly, it will -- be optional. -- Have timeouts on how much time it can take to finish (default, overridable, and upper limit. Default = today's time for AMs to exit after unregister?) -- Have resource requests limits like usual -- May have its own retries (Cleanup failure != Application failure as today?) Some challenges - Cleanup container may not get resources because cluster may have gotten busy after the final AM exit. Solution is to reserve (part of) resources used by the last AM for use by the cleanup container YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055231#comment-14055231 ] Bikas Saha commented on YARN-2261: -- +1 for having the control/responsibility in YARN An alternative that may fit better with the RM model of launching AM's is to optionally, have the RM run the AM in cleanup mode. This way the clean up logic can reside in the AM as it does today and the RM does not need to learn any new tricks about launching anything other that the AM. The existing launch context is used to launch the AM and the AM is told (via env or via register) that its in cleanup mode. The AM can use its logic to do cleanup and then successfully unregister with the RM. Until the unregister happens the RM can keep restarting the AM in clean up mode for a max of N times (to handled unexpected failures). When an AM is running in cleanup mode the it will not be allowed to make any allocated requests. This can be handled via AMRMClient so that AM's that use AMRMClient dont need to do anything. The ApplicationMasterService, of course, will need to handle this for non AMRMClient AM's. By this method, minimal changes will be needed in the API and RM internals to enable this feature in a compatible manner. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.2#6252)