[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate "long lived"
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159727#comment-15159727 ] Vinod Kumar Vavilapalli commented on YARN-1039: --- Moved this to be a sub-task of YARN-4692 given the renewed focus there. > Add parameter for YARN resource requests to indicate "long lived" > - > > Key: YARN-1039 > URL: https://issues.apache.org/jira/browse/YARN-1039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.1-beta >Reporter: Steve Loughran >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch > > > A container request could support a new parameter "long-lived". This could be > used by a scheduler that would know not to host the service on a transient > (cloud: spot priced) node. > Schedulers could also decide whether or not to allocate multiple long-lived > containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584957#comment-14584957 ] Carlo Curino commented on YARN-1039: Craig, can you comment on what are the properties of a service vs batch containers you are eluding to beside that one has an infinity duration, while the other one is expected to have a clear completion time? In my mind, if the only property we are capturing is time-to-completion, then we should just use duration, which is inherently more flexible and we want for other things anyway. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584271#comment-14584271 ] Craig Welch commented on YARN-1039: --- I'll go back to my earlier assertion that I think it's not duration we are really concerned with here, that is covered in various ways in other places, but more the notion of an application type, a batch or a service, with the defining characteristic being for the potential of continuous operation (service) or unit of work which will run to completion (batch), and an enumeration of service and batch make sense to me. In any case, [~vinodkv], it seems that there still seems to be enough diversity of opinion here to require some ongoing discussion/reconciliation, so I will leave this in your capable hands. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564677#comment-14564677 ] Steve Loughran commented on YARN-1039: -- One aspect of a flag is that it could be used in schedulers, not just when placing/scheduling containers with the bit set, but when looking at where to place new work. Example: if all the containers on a single host are tagged as long-lived, there's little point in waiting for a free space to appear there before downgrading to launching a container requested against that host elsewhere. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549505#comment-14549505 ] Chris Douglas commented on YARN-1039: - The semantics of a boolean flag are opaque. The policies enforced by different RM configurations (and versions) will not be- and cannot be made to be- consistent. Application and container priority are already encoded (or in progress, YARN-1963), so it's not just preemption priority or cost. Affinity and anti-affinity are also covered by different features. Discussion has been wide-ranging because it is unclear what long-lived guarantees across existing features (beyond removing the progress bar from the UI, which I hope we can stop mentioning). An implementation that only recognizes infinite and undefined leases could be mapped into duration. Lease duration could also be used to communicate when security tokens cannot be renewed, short-lived guarantees for YARN-2877 containers, boundaries of YARN-1051 reservations, and planned decommissioning. In contrast, the long-lived flag cannot be used for these cases. We could expose probabilistic guarantees (which are what we give in reality), but that's a later issue. Considering the blockers more concretely: bq. (a) reservations (b) white-listed requests or (c) node-label requests getting stuck on a node used by other services' containers that don't exit. Aren't these handled by adding a timeout to allocations, which would also catch cases where this flag is _not_ set? The timeout value could be set across the scheduler to start, but could even be user-visible in later versions... All said, I don't have time to work on this, agree the API can be evolved from the flag, and am -0 on it. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541612#comment-14541612 ] Steve Loughran commented on YARN-1039: -- +1 for a long-lived bit. Services can set the flag, and it is up for future versions of Hadoop to implement the logic to go with it. FWIW, I'd make the first use of the patch the YARN-1079 progress bar. Why? it's the least amount of server-side code changes (no scheduling patches), it fixes a tangible problem for users (progress bar is confusing), and it provides an immediate benefit to the apps —so encouraging them to set the flag, maybe even by reflection if they want to stay compatible across hadoop versions. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541487#comment-14541487 ] Carlo Curino commented on YARN-1039: I agree this conversation floated all-over the map. Thanks for instigating convergence. I favor the duration as it easily covers the boolean use-case, and gives us a little extra information bandwidth (i.e., accomodates few upcoming usecases with no changes). However, I understand where the pushback would come from, and I can't argue too much against keeping things more minimal to start. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541129#comment-14541129 ] Vinod Kumar Vavilapalli commented on YARN-1039: --- *sigh* This JIRA was all over the place. Can we please agree not to discuss here *how* long running services related scheduling features, UI, log-aggregation, security-tokens should be implemented? There are separate JIRAs with good progress on each of them. Let's also please not discuss how the platform _could_ make use of the notion of a long-lived nature of an application/container. I understand that the type of usage shall dictate what the input will look like, but hold on to that for a second. h3. Blocker I've already started seeing real-life situations where we need the RM to know about the long-lived'ness of a container and an application. The prominents one of this are (a) reservations (b) white-listed requests or (c) node-label requests getting stuck on a node used by other services' containers that don't exit. Absence of this notion is increasingly becoming a *blocker* for running services. I'd like to get some progress here. h3. Short Proposal There seems like a general agreement on having the notion itself. Here are the proposals and dimensions # The notion at app level, at per container level # a boolean flag, an enum, duration I propose that we solve the blocker use-case that I pointed above with a boolean at both app-level and container-level. Tomorrow, when somebody implements a duration based bin-packing scheduling policy, they can add in the notion of a duration and then reconcile the boolean with infinity values on the duration. The enum proposal is to me a dup of YARN-3409 which covers a much larger problem space. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312096#comment-14312096 ] Sunil G commented on YARN-1039: --- duration is a better metric than token names. However to reach to this duration metric, few trail runs for application is needed OR new container requests can be raised by AM based on its previous containers running time. So a feedback mechanism to AM is coming alive here from RMs perspective, like AM is supposed to run a container for so long duration, but since as the limit is crossed, AM can take some action. I feel this will add a good amount flexibility. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310683#comment-14310683 ] Steve Loughran commented on YARN-1039: -- specifying some range of likely duration may work...certainly if something takes very much longer than expected that's potentially a warning that something has gone wrong ... though really the AM should be handling that. For anyone implementing pre-emption in a scheduler, how would longevity flags be interpreted? As a hint that container's wont be going away any time soon, so that pre-emption is the best strategy for scheduling other work? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311146#comment-14311146 ] Carlo Curino commented on YARN-1039: An idea of how long container will last would be very very useful for preemption. The ProportionalCapacityPreemptionPolicy currently needs to guess how many containers would naturally complete before the preemption action happens (to avoid over-shooting). The information about container durations (even if rough) would made this a much more informed guess. Again, this is for optimization purposes not correctness so we can tolerate a fair bit of errors. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310328#comment-14310328 ] Carlo Curino commented on YARN-1039: Tossing some fire back on duration. I read your concerns of applications ability to provide good values, however, I'd rather have the app providing their best duration estimate (and the framework rounding it or bucketing it), than the app providing a coarse grained tag-based version in the first place. Changing cluster configurations and policies might turn what used to be a short task into something not that short, which we want to handle differently and so on. In a sense asking for duration prevent us to rely on what application will judge as short/long etc.. As another example, based on whatever mechanisms for log aggregation we will have in the future, we can change our mind about what are the cut-points for short/long etc.. For example, because a new technique makes it very cheap and we want to provide much more frequent feedback to users. Bottom line, I find duration a rather neutral thing to ask, vs something which is more opinion-based, and corner cases like never-ending services are easily handled with -1 or +inf values. I also agree that there are many other use cases for tags, that emerged in the discussion, which have a clear value and are by no means covered by duration. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297783#comment-14297783 ] Craig Welch commented on YARN-1039: --- [~chris.douglas] bq. YARN shouldn't understand the lifecycle for a service or the progress/dependencies for task containers That's not necessarily so, there are some cases where the type of life cycle for an application is important, for example, when determining whether or not it is open-ended (service) or a batch process which entails a notion of progress (session), at least for purposes of display. I think we need to re scope and clarify this jira a bit so that we can make progress - there are a number of items in the original problem statement and subsequent comments which have been taken on elsewhere and so really no longer make sense to pursue here. Here's an attempt at a breakdown: bq. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node I think this is now clearly covered by [YARN-796], nodes having qualities (including operational qualities such as these) is one of the core purposes of this work, it makes no sense to duplicate it here, and so it should be de-scoped from this jira bq. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node As [~ste...@apache.org] mentioned in an earlier comment [https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14038041page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038041] affinity / anti-affinity is covered in a more general sense in [YARN-1042]. The above component of this jira is really just such a case, and so it should be covered with that general solution and dropped from scope as well. There may be some interest in informing that solution based on a generalized service setting, but to really understand that the affinity approach needs to be worked out - and I think the affinity approach will really need to inform/integrate with this rather than the other way around, and integration should be approached as part of that effort That leaves nothing, so we can close the jira ;-) Not quite, there were several things added in comments: Token management - handled in [YARN-941] Scheduler hints not related to node categories or anti-affinity (opportunistic scheduling, etc) - this does strike me as something better handled via the duration route et all [YARN-2877] [YARN-1051] and not something which needs to be replicated here I think that really just leaves the progress bar (and potentially other display related items). This is covered by [YARN-1079] I suggest, then, that we either rescope this jira to providing the lifecycle information as an application tag [https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14039679page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039679] as suggested by [~zjshen] early on or close it and cover the work as part of [YARN-1079]. I originally objected to that approach on the basis that tags appeared to be a display type feature which did not fit this effort, but if re scoped as I'm proposing, it becomes such a feature, and I think that approach is now a good fit. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298135#comment-14298135 ] Chris Douglas commented on YARN-1039: - bq. That's not necessarily so, there are some cases where the type of life cycle for an application is important, for example, when determining whether or not it is open-ended (service) or a batch process which entails a notion of progress (session), at least for purposes of display. That's a fair distinction. Would you agree the YARN _scheduler_ should not use detailed information about progress, task dependencies, or service lifecycles? If an AM registers with a tag that affects the attributes displayed in dashboards, then issues like YARN-1079 can be resolved cleanly, as you and Zhijie propose. Steve has a point about mixed-mode AMs that run both long and short-lived containers (e.g., a long-lived service supporting a workflow composed of short tasks). If it's solely for display, then an enum seems adequate, but I'd like to better understand the use cases. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294557#comment-14294557 ] Craig Welch commented on YARN-1039: --- [~chris.douglas] what's the proper duration for a service which does not have a pre-defined lifetime? This distinction is not really about how long will it run but more about what is the lifecycle of this app - as [~ste...@apache.org] points out, is it session or batch oriented (something which has a defined set of work, so it has a notion of progress to completion) or is it a running process with an indeterminate/unknown lifetime which handles whatever work is sent it's way (a service). This is really the distinction needed here - it's a qualitative difference regarding a lifecycle, the notion of an enumeration of lifecycle types makes sense for this. Users will often have no idea how long their application will run, but they will generally have a clear notion of it's lifecycle. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294529#comment-14294529 ] Chris Douglas commented on YARN-1039: - Requiring accurate estimates is not realistic, but no service runs forever in the same container(s). If container leases can be renewed/refreshed, that's a manageable and realistic guarantee for the user (couldn't find a JIRA; it must exist). Migration, decommission, OS upgrades, and other operations-in-time on containers seem necessary to support long-running services, since preemption is comparably heavy-handed. Specifying a precise duration may be a little pedantic for the existing use cases, but it seems like the right abstraction. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294613#comment-14294613 ] Chris Douglas commented on YARN-1039: - [~cwelch] YARN shouldn't understand the lifecycle for a service or the progress/dependencies for task containers. As proposed, an AM will receive a lease on a container for some duration. Before the lease expires, it can relinquish the lease or request that it be renewed. While this adds some complexity in the AM implementation- it needs to track and renew its container leases- it's mostly library code that admits straightforward, naive implementations. The most obvious strawman would request all resources at the longest possible duration and always renew. Mapping an enumeration expressing an AM lifecycle into a policy for requesting, refreshing, and managing resources is an excellent client-side abstraction. Even if an implementation of YARN only receives (and only issues) leases from a fixed set of values, the underlying abstraction can admit arbitrary durations. An enumeration is a good API for applications, but it's the RM framework could have a more fine-grained substrate. Leases actually help services run under YARN. By way of example, refusing to renew a lease could signal that the node will be decommissioned, or that some cluster-wide invariant- like balanced utilization or fairness- is better met by (re)moving that container. Refusing to renew a lease- or renewing it for a shorter period- could signal the service to request new containers. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285449#comment-14285449 ] Steve Loughran commented on YARN-1039: -- bq. However, this can be done either based on historical information (previous waves of this task type or previous execution of the same job) or on application level knowledge. Historical information is generally the best estimate, though if the input data is different, so can duration. Maybe a simple enum as short-lived, session, and service: services provide no termination, session = a few hours to a few days (i.e within the lifespan of kerberos tokens). Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283734#comment-14283734 ] Steve Loughran commented on YARN-1039: -- I've always envisaged the flag could switch on some different policies, though with container-preservation across restarts, labels, log aggregation and windows for failure tracking, much of that is dealt with. Otherwise, the longevity flag could be of use in # RM UI. There's no percentage done any more, more live/not-live. This already causes confusion for our slider users. # placement: do you want 100% of a node capacity to be for long-lived stuff, at the expense of being able to run anything short-lived there? # pre-emption. The cost of pre-emption may be higher, but at the same time long-lived containers are the ones you may want to pre-empt the most, because the scheduler knows they won't go away any time soon. The easy target is the UI, as that doesn't need scheduling changes, and the current percentage done view doesn't work. Something to indicate live/not live makes more sense (though not red/green unless you don't want colour blind people using your app) Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284104#comment-14284104 ] Carlo Curino commented on YARN-1039: I am happy the conversation is re-ignited. As I was mentioning in [above | https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14048345page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14048345], the long-lived tag is a coarse grained version of the notion of duration we added to the ReservationRequest (which tracks very closely ResourceRequest) in YARN-1051. The idea is that the AM could provide an estimate of the task duration, enabling (beyond what Steve already listed above) optimistic scheduling decisions like the one in YARN-2877 very short tasks (we run several experiments and the potential for increased utilization is substantial). Given a duration parameter, expressing long-lived can be done by setting duration to a large value (or MAX_INT, or -1 or whatever convention). Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284325#comment-14284325 ] Craig Welch commented on YARN-1039: --- Another thought - if we do need this kind of flag, I think we should detach the notion from duration or long life as such - I think it's more about service vs batch - where a service's duration is not necessarily related to any preset notion of a work item it will start, work on, and complete - it will be started to handle work which is given to it, of unknown quantity ( potentially many different items) and stopped when no longer needed - it's not so much about the duration as the lifecycle (a batch operation may have a longer runtime than a service, for example). So, I'd suggest dropping the temporal flavor and going with service vs batch, or something along those lines. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284446#comment-14284446 ] Jian Fang commented on YARN-1039: - The duration concept comes with a good intention, but what I really am afraid of is that it could introduce a huge complex to YARN if it is not designed properly. First, there are so many moving parts under the hook for the estimation, for example, the time of a 30 node cluster may be significantly different from the one of a 300 node cluster. Getting into the measurement and estimation business is very much like walking into benchmark comparison business, which is very hard in reality. Secondly, the duration probably relies on hadoop customers to provide a proper value for it if YARN is not smart enough to derive the value by itself, which could be impractical for many customers. Remember that many hadoop users are not even developers. Many of them rely on high level components such as pig and hive to run hadoop jobs. They probably don't know or care about the estimation. As a result, at least, the duration should only be an enhancement if the value is provided. YARN should still work properly without such a value. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284920#comment-14284920 ] Konstantinos Karanasos commented on YARN-1039: -- Let me add my thoughts regarding whether we should allow duration to be reported instead of just a boolean switch for short tasks. I am actively involved on adding distributed scheduling capabilities ([YARN-2877]). We have performed an extensive experimental evaluation that has shown significant performance improvements in terms of throughput and latency, especially when short tasks are concerned. In that scenario, having the ability to specify the duration of the task is crucial (for deciding what type of container to use [[YARN-2882]], for estimating the waiting time in the NMs [[YARN-2886]], etc.). I understand the concerns that have been raised about how to properly provide the right task duration. However, this can be done either based on historical information (previous waves of this task type or previous execution of the same job) or on application level knowledge. We are already experimenting with ways of how to deal with imprecise task durations. That said, I definitely agree with [~john.jian.fang] that the user should not *have to* provide any task duration (i.e., the system should work properly in case no durations are provided), but on the other hand, in case she does, we should be able to take advantage of it. Moreover, as [~curino] pointed out, if the API exposes an integer instead of a boolean, we can simulate the boolean switch (e.g., by setting the value to MAX_INT for long tasks), but if we simply use a boolean, we would have to change the API in the future to support duration. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284290#comment-14284290 ] Wangda Tan commented on YARN-1039: -- For task placement for long-lived request, YARN-796 could take care of deciding which instance should run for a specific long-lived request. User can either manually specify label they want for such long-lived containers, or add some rules in scheduler side to configure and add labels automatically to such long-lived requests. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284232#comment-14284232 ] Jian Fang commented on YARN-1039: - Thanks Steve for your clarification. Seems the long lived concept makes sense now if this flag is associated with policy switch in YARN. I think the above is only one part of the story. Cluster infrastructure itself probably is another part that we need to consider. Just like the spot instance feature in EC2 as mentioned in this JIRA. The long lived concept should have more impacts on hadoop clusters in a cloud environment. For example instance type could affect the container scheduling. We should also take this concept into consideration for some elastic features such as graceful expansion and shrink of a cluster in cloud. On the other side, I still think YARN-796 should be used together with the long lived concept. For example, how would resource manager know which instance should run a long lived daemon/task? There should be a mapping between the long lived concept and the tags/labels provided by instance. Right? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284288#comment-14284288 ] Wangda Tan commented on YARN-1039: -- I agree with Carlo for this point. Duration can include long-lived or short-lived. It may hard to estimate the exact time of a container running, but a rough estimate can help scheduler make better decision and provide corresponding information to user which mentioned by Steve. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284304#comment-14284304 ] Craig Welch commented on YARN-1039: --- As I understand it (and, I may be wrong on this...) the original intent of this jira was to provide a boolean switch to control a set of behaviors expected to be important for a long running service - among other things, what sort of nodes to schedule on and how to handle logs. This could be on a sliding scale based on duration, but I'm not sure that works so well - at what duration do we start to change how we handle logs and / or where we schedule things? While related, I think that converting this from a boolean to a range will make it more difficult to use it for the intended usecase. I also think that packing together all of these behaviors into one parameter might be a negative overall. I do think, to [~john.jian.fang] 's point, as of now using this to determine where to schedule tasks to avoid spot instances and the like has really been superseded by Node Labels and I do not think we should add additional functionality for that here - Node Labels is really the way to handle that part of the usecase. That leaves, potentially among other things, affinity/anti-affinity issues (not scheduling long running tasks together/scheduling them together) and log handling (how do we tell the system we want log handling for a long running service, if, in fact, the system needs to be told that). I submit that it would be better to have separate solutions to each of these needs which can be bundled together to achieve the overall usecase, as I think that will provide better control without adding too much complexity for the end user. Which means that we would break this out into affinity/anti-affinity and logging configuration. We could always have a single parameter (like this one) which set's the others for convenience, I'm not sure we'll actually need it, but I do think that splitting out the bundled functionality into individual items (some of which may already be being worked on elsewhere) is the way to go. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283147#comment-14283147 ] Jian Fang commented on YARN-1039: - The container request could specify what tags/labels it requires, right? Tag/label is not really a resource but an attribute instead IMO, just like short lived and long lived. Even you could specify long-live or short-live, resource manager still needs to translate that into something meaningful, right? Or do you say that you have some specific logic in YARN to handle the long lived containers? If that is true, then it is a different story. Could you please elaborate a bit more about how long lived is defined in YARN and what kinds of specific handling there? Thanks. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283074#comment-14283074 ] Jian Fang commented on YARN-1039: - The term long lived relies on resource manager to understand what long lived means. How to define that in resource manager then? Do you still rely on node managers to provide tags/labels and resource manager to understand them? If that is true, shouldn't YARN-796 have already addressed this issue with a more generic way to schedule containers based on tags/labels? Personally, I think YARN-796 is more generic. Take the spot instance mentioned here as an example, customers don't want to schedule AM containers on spot instances as well, not just long lived tasks. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283083#comment-14283083 ] Hadoop QA commented on YARN-1039: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651791/YARN-1039.3.patch against trunk revision 4a44508. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6361//console This message is automatically generated. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283096#comment-14283096 ] Xuan Gong commented on YARN-1039: - [~jfeng] bq. If that is true, shouldn't YARN-796 have already addressed this issue with a more generic way to schedule containers based on tags/labels? Yes, it is. But for node labeling, that is for resource scheduling. I think that the application should also identify itself as long-live or short-live. If not, how the RM figure out which resources I should assign to this application ? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048345#comment-14048345 ] Carlo Curino commented on YARN-1039: Hi Guys, I am just tuning in now... (apologies if I am misinterpreting the conversation), but it seems that some of the proposed changes resemble what we were proposing for the reservation YARN-1051 work. In the sub-task YARN-1708 we propose and extension of ResourceRequest that expresses the duration (or leaseDuration if you prefer) for which resources will be reserved... The same concept could be used here as a hint from the user on for how long I expect to hold onto the resources. What I am suggesting is that having a time associated with a ResourceRequest could serve both purposes, and be a generally useful hint to the RM. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043238#comment-14043238 ] Wangda Tan commented on YARN-1039: -- bq. it must be set at application creation time and all containers of the app will be considered long lived. This is because the RM does not keep track of individual container requests. I think [~vinodkv]'s suggestion makes more sense to me: https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14041652page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14041652 And as [~cwelch] mentioned, we don't need constraint if an app is long-lived that all its containers should be long-lived, it's better to leave this decision to app itself. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041778#comment-14041778 ] Craig Welch commented on YARN-1039: --- That's along the lines of what I was thinking after talking with [~xgong] and looking around a bit more It sounds like we need to be able to ask the resource manager for a container for long-lived cases (not on a spot instance, for example), both when launching the AM container (in the ApplicationSubmissionContext) and when the ApplicationMaster wants to get a container later on (ResourceRequestProto). This is really a scheduling hint for the resource manager (in both cases) We need to be able to mark an application as long running for other reasons (adjusting progress bar behavior, etc) We need to be able to tell the node manager that a container will be long running when it is launched (to adjust logging behavior, etc). An application master may launch instances not like itself (some not long running when it is long running) - which it can, as the application master can specify whatever it wants to in the resourcerequestproto I do think it would be good to keep the interface as consistent as possible, and should probably have at least a rough idea of the whole picture before making additions. I suggest this: An enum of scheduling constraints, initially only to include LONG_RUNNING, later would include affinity, etc, this is solely for node selection by the resource manager A repeated field of this enum on the ResourceRequestProto and the ApplicationSubmissionContext, in both cases this is purely a constraint on where the container is placed (the application master in the latter case) Go ahead and use a tag [~zjshen] on the application submission to indicate that an application is long-running for purposes of display (things like the progress bar, etc) (that seems to be an appropriate use for application tags) a boolean value on the ContainerLaunchContextProto to indicate it is long-running There are some tradeoffs in this approach but I think it's good overall - All the variations we have identified are covered It is consistent in how it handles launching a long-running container for both the application master and other containers It is also consistent with the approach to date wrt the application submission context and the resource request (where items needed for launching the application master container are added to the application submission context) When other scheduler constraints relevant for an application master are introduced later the api will not need to change to accommodate them (other than adding them to the enum) We reuse the application tag for display and other like purposes, and in general are adding the minimum necessary to cover the identified cases (I thought it was simplest to just use a boolean on the container launch context, in that case the behavior is one way or the other, and other scheduling constraints don't apply). Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041874#comment-14041874 ] Zhijie Shen commented on YARN-1039: --- bq. I see. I'd assume that the service flag would imply long-lived, but maybe they could be separated. Just think it out loudly. Please correct me if I'm wrong or missing something. service and long-lived overlap to some extent when describing an application, as we usually think a service is going to run for a long time. However, IMHO, service should not be the necessity for long-lived. Theoretically, a MR job can be big enough to run for a long time as well. We may want to differ the application with service from others by some of the applications' native characteristics. For example, progress is not going to make sense to the applications that are labeled service, while we still want it for a MR job even if it runs for days, don't we? Moreover, service sounds the application-level only property, and we won't mark a single container as a service. On the other hand, long-lived is used to mark an application that is supposed to run for long time. However, it can only indicate the application is likely to run for a long time, but can not guarantee it will actually. I'm wondering if we really need to mark an application long-lived when submission. Is it feasible to justify whether an application is long-lived by how much time it has already spent in the cluster, and the long-lived application is going to be handled properly in implicit way? For example, when we come to AM retry opportunities (one issue for long-lived application), we can choose to refresh the quota given the application is working well for a while. We don't need to rely on long-lived label. The reason that I can think of why we must has this label upfront is that some special treatments for the long-lived application should start from the beginning. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041875#comment-14041875 ] Zhijie Shen commented on YARN-1039: --- Upgrade the jira to major given a long discussion. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042350#comment-14042350 ] Anubhav Dhoot commented on YARN-1039: - The tokens for long lived applications jira is [YARN-941|https://issues.apache.org/jira/browse/YARN-941] Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042370#comment-14042370 ] Steve Loughran commented on YARN-1039: -- bq. We need to have another flag to indicate that the containers requested by an AM will be long lived, it must be set at application creation time and all containers of the app will be considered long lived. This is because the RM does not keep track of individual container requests. I see this, but disagree as it doesn't meet all use cases. For different requests we may want: long-lived, pre-emptible, anti-affine, This can't go in requests -as you point out- but we already have a per-request flag that really sets a bit in the priority level -the lax placement option. If the other requests set the values at that priority then it is similar. Even so, setting these values in a request is confusing -even today. It would be better to have some operation to get/set the attributes of a priority for requests. This would be a bigger change...something we my not want to rush into. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042469#comment-14042469 ] Craig Welch commented on YARN-1039: --- {quote}We need to have a flag to indicate an AM is long lived. We need to have another flag to indicate that the containers requested by an AM will be long lived,{quote} I believe the proposal meets those needs, it is one particular way to do so... {quote}it must be set at application creation time and all containers of the app will be considered long lived. This is because the RM does not keep track of individual container requests.{quote} I'm not sure why it matters whether or not RM keeps track of the container requests - the AM will request containers with scheduling constraints like long lived, affinity, etc, and RM considers them when selecting nodes, after that completes it no longer necessarily matters. If the AM needs a relationship between nodes for a request, or a particular type of selection (not on a spot node - long-running) it will make a request for those nodes, get nodes that meet it's needs, and it's good to go. It sounds as though it would be more flexible / meet a wider set of usecases and therefore be preferable to allow an application master to obtain different types of containers for different purposes during it's lifetime as opposed to forcing to use only one set of container constraints throughout {quote}Having a long enum of flag to indicated optional qualities of the requested containers has been discussed in the past (in the context of some JIRAs related to Llama) and it has been discarded as it would mean divergence on the features different schedulers support.{quote} So, for this jira there is a desire to support selecting nodes with particular qualities (not placing a long running process on a temporary/spot instance), coming soon are other needs for other similar selection/constraint logic (affinity, anti-affinity, etc) - not being able to indicate qualities for the containers would keep us for being able to support those needs, and I believe there is a need to support this functionality. It's filtering/constraint/selection logic and could probably be generalized in a way which could be used by various schedulers... Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041086#comment-14041086 ] Craig Welch commented on YARN-1039: --- [~ste...@apache.org] wrt the need for a container level flag / a way for the application master to launch long lived containers - definitely, but the idea was for that to come as a later step - although that may be short-sighted, as it may be better to come up with a common way to do this for the application master container and the containers it later launches now instead of ending up with unmatched approaches later... This first step is to provide a way for the application master to be launched in a long lived container (generally, an application master for a long lived application will need to itself be launched in a long lived container - at least, it needs to be possible to do so) - which is why there needs to be some way to indicate the need for a long lived container during application submission (necessary but not sufficient overall...) [~zjshen] I was also wondering about using the tags, but after talking with [~xgong] we are not thinking that is the way to go because tags don't seem to be about changing behavior but only about freeform way to enable search/display/etc. After this discussion and some looking around it really seems that what we are after is a way to communicate a quality of the needed container to the resource manager both at application submission (for the application master container) and also for later container launches by the master, kind of like the ResourceProto, which is also already present in both cases for the same reason (I suggested adding it there, actually, as something necessary for the container but [~xgong] objected, thinking it is really specific to metric qualities (cpu, memory...). I'm going to take a look at adding something alongside /similar to the ResourceProto to indicate constraints/requirements for the container, starting with long lived, that can be common to application submission and when the containers are started later by the application, not necessarily a long field for bit manipulation but something which is also extensible Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041105#comment-14041105 ] Steve Loughran commented on YARN-1039: -- I see. I'd assume that the service flag would imply long-lived, but maybe they could be separated. I'd like to see a {{long}} enum of flags here as its easier to be forwards compatible Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041110#comment-14041110 ] Craig Welch commented on YARN-1039: --- The more I look around, the better I like the idea of adding it to the resource proto. It is the same kind of thing as the items already in there - it's a characteristic required for the container (it isn't a metric style quality, but still, it's a characteristic of the resource needed) and it is already present everywhere the information is needed (at application submission and when containers are requested). Adding something so similar alongside the resource proto seems unnecessary. Do you agree with [~xgong]'s concerns or do you think it makes sense to add it there? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041652#comment-14041652 ] Vinod Kumar Vavilapalli commented on YARN-1039: --- I am not against a container/resource level definition of whether that container is long lived or not, but I think it is equally important to mark at the application level if _at least_ one container in the application is considered long lived. So, to summarize, how about - an app-level isLongRunning() that indicates _if at least one container of this application will be long-running_ and - a resource-request level isLongRunning() that indicates _if this container is long running or not_. The app-level flag can help UIs, making very quick scheduling distinctions etc. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039744#comment-14039744 ] Varun Vasudev commented on YARN-1039: - I agree with [~zjshen]. Using the tags field also means we don't have to worry about switching to an enum like [~cwelch] mentioned in one of earlier comments. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039976#comment-14039976 ] Steve Loughran commented on YARN-1039: -- # I'd make the long-lived flag a container request, *not the AM launch request*. An AM may wish to indicate that some containers are shortlife, others long-lived. # If the tag approach lets my AM add this request while running with the 2.4 JARs -even though the hint will be ignored- I'm happy. Protobuf may be agile, but the generated proto classes aren't, and working with fields directly is hard to do, introspection brittle. I know that from working with the am restart flag. # Otherwise, I'd like a long64 with bits we can set and read. It's the cross-platform way and would give us a single field for future additions Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040001#comment-14040001 ] Zhijie Shen commented on YARN-1039: --- bq. An AM may wish to indicate that some containers are shortlife, others long-lived. Container-level long-live flag is an interesting idea. Given any container of an app is long-lived, the AM container is automatically going to be long-lived as well, right? Suppose AM should last until the exit of the whole app. Shall we mark an app long-lived, and then allow long-lived app to start a long-lived container? bq. If the tag approach lets my AM add this request while running with the 2.4 JARs even though the hint will be ignored I'm happy. If the granularity is going to be container, the tag may not help, as it's an application-level information Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039152#comment-14039152 ] Craig Welch commented on YARN-1039: --- Not to make things overly complex, but we were talking about making this an enum rather than a simple boolean, with the notion that this is one of a number of possible scheduler hints we may want to support - with values like PERSISTENT, TRANSIENT, RELOCATABLE, etc (a single value or possibly a list/set of values, for cases which are not mutually exclusive). Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039518#comment-14039518 ] Craig Welch commented on YARN-1039: --- I went ahead and just added a boolean flag - there does seem to be room to generalize this in the future but at the moment it's not entirely clear to me how best to do that / that there are enough examples to do it properly. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039557#comment-14039557 ] Hadoop QA commented on YARN-1039: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651767/YARN-1039.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4035//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4035//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4035//console This message is automatically generated. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039613#comment-14039613 ] Hadoop QA commented on YARN-1039: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651782/YARN-1039.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4036//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4036//console This message is automatically generated. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039648#comment-14039648 ] Hadoop QA commented on YARN-1039: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651791/YARN-1039.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4038//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4038//console This message is automatically generated. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039679#comment-14039679 ] Zhijie Shen commented on YARN-1039: --- Can we make use of the tag in the application submission context directly, instead of adding a dedicate field? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038041#comment-14038041 ] Steve Loughran commented on YARN-1039: -- marking as depended on by YARN-896. I would keep the affinity logic separate, as discussed in YARN-1042 Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038248#comment-14038248 ] Vinod Kumar Vavilapalli commented on YARN-1039: --- For now, we can start with a parameter on the ApplicationSubmissionContext - we are still figuring out long-running services before delving into enabling a smaller subset of long-lived containers within a larger application.. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998396#comment-13998396 ] Xuan Gong commented on YARN-1039: - Start to work on it. Will provide a proposal soon. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Xuan Gong Priority: Minor A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)