subject:"\[jira\] \[Commented\] \(YARN\-1039\) Add parameter for YARN resource requests to indicate long lived"

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate "long lived"

2016-02-23 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159727#comment-15159727
 ] 

Vinod Kumar Vavilapalli commented on YARN-1039:
---

Moved this to be a sub-task of YARN-4692 given the renewed focus there.

> Add parameter for YARN resource requests to indicate "long lived"
> -
>
> Key: YARN-1039
> URL: https://issues.apache.org/jira/browse/YARN-1039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.1-beta
>Reporter: Steve Loughran
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
>
>
> A container request could support a new parameter "long-lived". This could be 
> used by a scheduler that would know not to host the service on a transient 
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived 
> containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-06-14 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584957#comment-14584957
 ] 

Carlo Curino commented on YARN-1039:


Craig, can you comment on what are the properties of a service vs batch 
containers you are eluding to beside that one has an infinity duration, while 
the other one is expected to have a clear completion time? 

In my mind, if the only property we are capturing is time-to-completion, then 
we should just use duration, which is inherently more flexible and we want for 
other things anyway.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-06-12 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584271#comment-14584271
 ] 

Craig Welch commented on YARN-1039:
---

I'll go back to my earlier assertion that I think it's not duration we are 
really concerned with here, that is covered in various ways in other places, 
but more the notion of an application type, a batch or a service, with the 
defining characteristic being for the potential of continuous operation 
(service) or unit of work which will run to completion (batch), and an 
enumeration of service and batch make sense to me.  In any case, 
[~vinodkv], it seems that there still seems to be enough diversity of opinion 
here to require some ongoing discussion/reconciliation, so I will leave this in 
your capable hands.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-05-29 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564677#comment-14564677
 ] 

Steve Loughran commented on YARN-1039:
--

One aspect of a flag is that it could be used in schedulers, not just when 
placing/scheduling containers with the bit set, but when looking at where to 
place new work. Example: if all the containers on a single host are tagged as 
long-lived, there's little point in waiting for a free space to appear there 
before downgrading to launching a container requested against that host 
elsewhere.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-05-18 Thread Chris Douglas (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549505#comment-14549505
]

Chris Douglas commented on YARN-1039:
-

The semantics of a boolean flag are opaque. The policies enforced by different
RM configurations (and versions) will not be- and cannot be made to be-
consistent. Application and container priority are already encoded (or in
progress, YARN-1963), so it's not just preemption priority or cost. Affinity
and anti-affinity are also covered by different features. Discussion has been
wide-ranging because it is unclear what long-lived guarantees across existing
features (beyond removing the progress bar from the UI, which I hope we can
stop mentioning).

An implementation that only recognizes infinite and undefined leases could be
mapped into duration. Lease duration could also be used to communicate when
security tokens cannot be renewed, short-lived guarantees for YARN-2877
containers, boundaries of YARN-1051 reservations, and planned decommissioning.
In contrast, the long-lived flag cannot be used for these cases. We could
expose probabilistic guarantees (which are what we give in reality), but that's
a later issue.

Considering the blockers more concretely:
bq. (a) reservations (b) white-listed requests or (c) node-label requests
getting stuck on a node used by other services' containers that don't exit.

Aren't these handled by adding a timeout to allocations, which would also catch
cases where this flag is _not_ set? The timeout value could be set across the
scheduler to start, but could even be user-visible in later versions...

All said, I don't have time to work on this, agree the API can be evolved from
the flag, and am -0 on it.

Add parameter for YARN resource requests to indicate long lived
-

Key: YARN-1039
URL: https://issues.apache.org/jira/browse/YARN-1039
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch

A container request could support a new parameter long-lived. This could be
used by a scheduler that would know not to host the service on a transient
(cloud: spot priced) node.
Schedulers could also decide whether or not to allocate multiple long-lived
containers on the same node

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-05-13 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541612#comment-14541612
 ] 

Steve Loughran commented on YARN-1039:
--

+1 for a long-lived bit. Services can set the flag, and it is up for future 
versions of Hadoop to implement the logic to go with it. 

FWIW, I'd make the first use of the patch the YARN-1079 progress bar. 

Why? it's the least amount of server-side code changes (no scheduling patches), 
it fixes a tangible problem for users (progress bar is confusing), and it 
provides an immediate benefit to the apps —so encouraging them to set the flag, 
maybe even by reflection if they want to stay compatible across hadoop versions.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-05-13 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541487#comment-14541487
 ] 

Carlo Curino commented on YARN-1039:


I agree this conversation floated all-over the map. Thanks for instigating 
convergence.

I favor the duration as it easily covers the boolean use-case, and gives us a 
little extra information bandwidth (i.e., accomodates few upcoming usecases 
with no changes). 
However, I understand where the pushback would come from, and I can't argue too 
much against keeping things more minimal to start. 




 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541129#comment-14541129
]

Vinod Kumar Vavilapalli commented on YARN-1039:
---

*sigh* This JIRA was all over the place.

Can we please agree not to discuss here *how* long running services related
scheduling features, UI, log-aggregation, security-tokens should be
implemented? There are separate JIRAs with good progress on each of them.

Let's also please not discuss how the platform _could_ make use of the notion
of a long-lived nature of an application/container. I understand that the type
of usage shall dictate what the input will look like, but hold on to that for a
second.

h3. Blocker
I've already started seeing real-life situations where we need the RM to know
about the long-lived'ness of a container and an application. The prominents one
of this are (a) reservations (b) white-listed requests or (c) node-label
requests getting stuck on a node used by other services' containers that don't
exit.

Absence of this notion is increasingly becoming a *blocker* for running
services. I'd like to get some progress here.

h3. Short Proposal

There seems like a general agreement on having the notion itself. Here are the
proposals and dimensions
# The notion at app level, at per container level
# a boolean flag, an enum, duration

I propose that we solve the blocker use-case that I pointed above with a
boolean at both app-level and container-level. Tomorrow, when somebody
implements a duration based bin-packing scheduling policy, they can add in the
notion of a duration and then reconcile the boolean with infinity values on the
duration. The enum proposal is to me a dup of YARN-3409 which covers a much
larger problem space.

Thoughts?

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-02-09 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312096#comment-14312096
 ] 

Sunil G commented on YARN-1039:
---

duration is a better metric than token names. However to reach to this 
duration metric, few trail runs for application is needed OR new container 
requests can be raised by AM based on its previous containers running time.

So a feedback mechanism to AM is coming alive here from RMs perspective, like 
AM is supposed to run a container for so long duration, but since as the limit 
is crossed, AM can take some action. I feel this will add a good amount 
flexibility.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-02-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310683#comment-14310683
 ] 

Steve Loughran commented on YARN-1039:
--

specifying some range of likely duration may work...certainly if something 
takes very much longer than expected that's potentially a warning that 
something has gone wrong ... though really the AM should be handling that.

For anyone implementing pre-emption in a scheduler, how would longevity flags 
be interpreted? As a hint that container's wont be going away any time soon, so 
that pre-emption is the best strategy for scheduling other work?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-02-07 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311146#comment-14311146
 ] 

Carlo Curino commented on YARN-1039:


An idea of how long container will last would be very very useful for 
preemption. The ProportionalCapacityPreemptionPolicy currently needs to guess 
how many containers would 
naturally complete before the preemption action happens (to avoid 
over-shooting). The information about container durations (even if rough) would 
made this a much more informed 
guess. Again, this is for optimization purposes not correctness so we can 
tolerate a fair bit of errors.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-02-06 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310328#comment-14310328
 ] 

Carlo Curino commented on YARN-1039:


Tossing some fire back on duration. I read your concerns of applications 
ability to provide good values, 
however, I'd rather have the app providing their best duration estimate (and 
the framework rounding it 
or bucketing it), than the app providing a coarse grained tag-based version in 
the first place. 

Changing cluster configurations and policies might turn what used to be a 
short task into something 
not that short, which we want to handle differently and so on. In a sense 
asking for duration prevent 
us to rely on what application will judge as short/long etc.. 

As another example, based on whatever mechanisms for log aggregation we will 
have in the future, 
we can change our mind about what are the cut-points for short/long etc.. For 
example, because a
new technique makes it very cheap and we want to provide much more frequent 
feedback to users.

Bottom line, I find duration a rather neutral thing to ask, vs something 
which is more opinion-based,
and corner cases like never-ending services are easily handled with -1 or +inf 
values.

I also agree that there are many other use cases for tags, that emerged in the 
discussion, which have
a clear value and are by no means covered by duration.



 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-29 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297783#comment-14297783
 ] 

Craig Welch commented on YARN-1039:
---

[~chris.douglas]

bq.  YARN shouldn't understand the lifecycle for a service or the 
progress/dependencies for task containers

That's not necessarily so, there are some cases where the type of life cycle 
for an application is important, for example, when determining whether or not 
it is open-ended (service) or a batch process which entails a notion of 
progress (session), at least for purposes of display.

I think we need to re scope and clarify this jira a bit so that we can make 
progress - there are a number of items in the original problem statement and 
subsequent comments which have been taken on elsewhere and so really no longer 
make sense to pursue here.  Here's an attempt at a breakdown:

bq. This could be used by a scheduler that would know not to host the service 
on a transient (cloud: spot priced) node

I think this is now clearly covered by [YARN-796], nodes having qualities 
(including operational qualities such as these) is one of the core purposes of 
this work, it makes no sense to duplicate it here, and so it should be 
de-scoped from this jira

bq. Schedulers could also decide whether or not to allocate multiple long-lived 
containers on the same node

As [~ste...@apache.org]   mentioned in an earlier comment 
[https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14038041page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038041]
 affinity / anti-affinity is covered in a more general sense in [YARN-1042].  
The above component of this jira is really just such a case, and so it should 
be covered with that general solution and dropped from scope as well.  There 
may be some interest in informing that solution based on a generalized 
service setting, but to really understand that the affinity approach needs to 
be worked out - and I think the affinity approach will really need to 
inform/integrate with this rather than the other way around, and integration 
should be approached as part of that effort

That leaves nothing, so we can close the jira ;-)  Not quite, there were 
several things added in comments:

Token management - handled in [YARN-941]

Scheduler hints not related to node categories or anti-affinity (opportunistic 
scheduling, etc) - this does strike me as something better handled via the 
duration route et all [YARN-2877] [YARN-1051] and not something which needs to 
be replicated here

I think that really just leaves the progress bar (and potentially other display 
related items).  This is covered by [YARN-1079]  I suggest, then, that we 
either rescope this jira to providing the lifecycle information as an 
application tag 
[https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14039679page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039679]
 as suggested by [~zjshen] early on or close it and cover the work as part of 
[YARN-1079].  I originally objected to that approach on the basis that tags 
appeared to be a display type feature which did not fit this effort, but if re 
scoped as I'm proposing, it becomes such a feature, and I think that approach 
is now a good fit.  

Thoughts?


 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-29 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298135#comment-14298135
 ] 

Chris Douglas commented on YARN-1039:
-

bq. That's not necessarily so, there are some cases where the type of life 
cycle for an application is important, for example, when determining whether or 
not it is open-ended (service) or a batch process which entails a notion of 
progress (session), at least for purposes of display.

That's a fair distinction. Would you agree the YARN _scheduler_ should not use 
detailed information about progress, task dependencies, or service lifecycles? 
If an AM registers with a tag that affects the attributes displayed in 
dashboards, then issues like YARN-1079 can be resolved cleanly, as you and 
Zhijie propose.

Steve has a point about mixed-mode AMs that run both long and short-lived 
containers (e.g., a long-lived service supporting a workflow composed of short 
tasks). If it's solely for display, then an enum seems adequate, but I'd like 
to better understand the use cases.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-27 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294557#comment-14294557
 ] 

Craig Welch commented on YARN-1039:
---

[~chris.douglas] what's the proper duration for a service which does not have a 
pre-defined lifetime?  

This distinction is not really about how long will it run but more about 
what is the lifecycle of this app - as [~ste...@apache.org] points out, is it 
session or batch oriented (something which has a defined set of work, so it has 
a notion of progress to completion) or is it a running process with an 
indeterminate/unknown lifetime which handles whatever work is sent it's way (a 
service).  This is really the distinction needed here - it's a qualitative 
difference regarding a lifecycle, the notion of an enumeration of lifecycle 
types makes sense for this.  Users will often have no idea how long their 
application will run, but they will generally have a clear notion of it's 
lifecycle.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-27 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294529#comment-14294529
 ] 

Chris Douglas commented on YARN-1039:
-

Requiring accurate estimates is not realistic, but no service runs forever in 
the same container(s). If container leases can be renewed/refreshed, that's a 
manageable and realistic guarantee for the user (couldn't find a JIRA; it must 
exist). Migration, decommission, OS upgrades, and other operations-in-time on 
containers seem necessary to support long-running services, since preemption is 
comparably heavy-handed. Specifying a precise duration may be a little pedantic 
for the existing use cases, but it seems like the right abstraction.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-27 Thread Chris Douglas (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294613#comment-14294613
]

Chris Douglas commented on YARN-1039:
-

[~cwelch] YARN shouldn't understand the lifecycle for a service or the
progress/dependencies for task containers. As proposed, an AM will receive a
lease on a container for some duration. Before the lease expires, it can
relinquish the lease or request that it be renewed. While this adds some
complexity in the AM implementation- it needs to track and renew its container
leases- it's mostly library code that admits straightforward, naive
implementations. The most obvious strawman would request all resources at the
longest possible duration and always renew.

Mapping an enumeration expressing an AM lifecycle into a policy for requesting,
refreshing, and managing resources is an excellent client-side abstraction.
Even if an implementation of YARN only receives (and only issues) leases from a
fixed set of values, the underlying abstraction can admit arbitrary durations.
An enumeration is a good API for applications, but it's the RM framework could
have a more fine-grained substrate.

Leases actually help services run under YARN. By way of example, refusing to
renew a lease could signal that the node will be decommissioned, or that some
cluster-wide invariant- like balanced utilization or fairness- is better met by
(re)moving that container. Refusing to renew a lease- or renewing it for a
shorter period- could signal the service to request new containers.

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-21 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285449#comment-14285449
 ] 

Steve Loughran commented on YARN-1039:
--

bq. However, this can be done either based on historical information (previous 
waves of this task type or previous execution of the same job) or on 
application level knowledge.

Historical information is generally the best estimate, though if the input data 
is different, so can duration.

Maybe a simple enum as short-lived, session, and service: services 
provide no termination, session = a few hours to a few days (i.e within the 
lifespan of kerberos tokens). 

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283734#comment-14283734
 ] 

Steve Loughran commented on YARN-1039:
--

I've always envisaged the flag could switch on some different policies, though 
with container-preservation across restarts, labels, log aggregation and 
windows for failure tracking, much of that is dealt with.

Otherwise, the longevity flag could be of use in
# RM UI. There's no percentage done any more, more live/not-live. This 
already causes confusion for our slider users.
# placement: do you want 100% of a node capacity to be for long-lived stuff, at 
the expense of being able to run anything short-lived there?
# pre-emption. The cost of pre-emption may be higher, but at the same time 
long-lived containers are the ones you may want to pre-empt the most, because 
the scheduler knows they won't go away any time soon.

The easy target is the UI, as that doesn't need scheduling changes, and the 
current percentage done view doesn't work. Something to indicate live/not 
live makes more sense (though not red/green unless you don't want colour blind 
people using your app)

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284104#comment-14284104
 ] 

Carlo Curino commented on YARN-1039:


I am happy the conversation is re-ignited. As I was mentioning in [above | 
https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14048345page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14048345],
 the long-lived tag is a coarse grained version of the notion of duration 
we added to the ReservationRequest (which tracks very closely ResourceRequest) 
in YARN-1051. 

The idea is that the AM could provide an estimate of the task duration, 
enabling (beyond what Steve already listed above) optimistic scheduling 
decisions like the one in YARN-2877 very short tasks (we run several 
experiments and the potential for increased utilization is substantial).  Given 
a duration parameter, expressing long-lived can be done by setting duration 
to a large value (or MAX_INT, or -1 or whatever convention).



 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284325#comment-14284325
 ] 

Craig Welch commented on YARN-1039:
---

Another thought - if we do need this kind of flag, I think we should detach 
the notion from duration or long life as such - I think it's more about 
service vs batch - where a service's duration is not necessarily related to any 
preset notion of a work item it will start, work on, and complete - it will be 
started to handle work which is given to it, of unknown quantity ( potentially 
many different items) and stopped when no longer needed - it's not so much 
about the duration as the lifecycle (a batch operation may have a longer 
runtime than a service, for example).  So, I'd suggest dropping the temporal 
flavor and going with service vs batch, or something along those lines.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Jian Fang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284446#comment-14284446
]

Jian Fang commented on YARN-1039:
-

The duration concept comes with a good intention, but what I really am afraid
of is that it could introduce a huge complex to YARN if it is not designed
properly.

First, there are so many moving parts under the hook for the estimation, for
example, the time of a 30 node cluster may be significantly different from the
one of a 300 node cluster. Getting into the measurement and estimation business
is very much like walking into benchmark comparison business, which is very
hard in reality.

Secondly, the duration probably relies on hadoop customers to provide a proper
value for it if YARN is not smart enough to derive the value by itself, which
could be impractical for many customers. Remember that many hadoop users are
not even developers. Many of them rely on high level components such as pig and
hive to run hadoop jobs. They probably don't know or care about the estimation.

As a result, at least, the duration should only be an enhancement if the value
is provided. YARN should still work properly without such a value.

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284920#comment-14284920
 ] 

Konstantinos Karanasos commented on YARN-1039:
--

Let me add my thoughts regarding whether we should allow duration to be 
reported instead of just a boolean switch for short tasks.
I am actively involved on adding distributed scheduling capabilities 
([YARN-2877]). We have performed an extensive experimental evaluation that has 
shown significant performance improvements in terms of throughput and latency, 
especially when short tasks are concerned. In that scenario, having the ability 
to specify the duration of the task is crucial (for deciding what type of 
container to use [[YARN-2882]], for estimating the waiting time in the NMs 
[[YARN-2886]], etc.).

I understand the concerns that have been raised about how to properly provide 
the right task duration. However, this can be done either based on historical 
information (previous waves of this task type or previous execution of the same 
job) or on application level knowledge.
We are already experimenting with ways of how to deal with imprecise task 
durations.

That said, I definitely agree with [~john.jian.fang] that the user should not 
*have to* provide any task duration (i.e., the system should work properly in 
case no durations are provided), but on the other hand, in case she does, we 
should be able to take advantage of it.
Moreover, as [~curino] pointed out, if the API exposes an integer instead of a 
boolean, we can simulate the boolean switch (e.g., by setting the value to 
MAX_INT for long tasks), but if we simply use a boolean, we would have to 
change the API in the future to support duration.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284290#comment-14284290
 ] 

Wangda Tan commented on YARN-1039:
--

For task placement for long-lived request, YARN-796 could take care of deciding 
which instance should run for a specific long-lived request. User can either 
manually specify label they want for such long-lived containers, or add some 
rules in scheduler side to configure and add labels automatically to such 
long-lived requests.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Jian Fang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284232#comment-14284232
]

Jian Fang commented on YARN-1039:
-

Thanks Steve for your clarification. Seems the long lived concept makes sense
now if this flag is associated with policy switch in YARN.

I think the above is only one part of the story. Cluster infrastructure itself
probably is another part that we need to consider. Just like the spot instance
feature in EC2 as mentioned in this JIRA.

The long lived concept should have more impacts on hadoop clusters in a cloud
environment. For example instance type could affect the container scheduling.
We should also take this concept into consideration for some elastic features
such as graceful expansion and shrink of a cluster in cloud.

On the other side, I still think YARN-796 should be used together with the long
lived concept. For example, how would resource manager know which instance
should run a long lived daemon/task? There should be a mapping between the long
lived concept and the tags/labels provided by instance. Right?

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284288#comment-14284288
 ] 

Wangda Tan commented on YARN-1039:
--

I agree with Carlo for this point. Duration can include long-lived or 
short-lived. It may hard to estimate the exact time of a container running, but 
a rough estimate can help scheduler make better decision and provide 
corresponding information to user which mentioned by Steve.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-20 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284304#comment-14284304
 ] 

Craig Welch commented on YARN-1039:
---

As I understand it (and, I may be wrong on this...) the original intent of this 
jira was to provide a boolean switch to control a set of behaviors expected 
to be important for a long running service - among other things, what sort of 
nodes to schedule on and how to handle logs.  This could be on a sliding scale 
based on duration, but I'm not sure that works so well - at what duration do we 
start to change how we handle logs and / or where we schedule things?  While 
related, I think that converting this from a boolean to a range will make it 
more difficult to use it for the intended usecase.  I also think that packing 
together all of these behaviors into one parameter might be a negative overall. 
 I do think, to [~john.jian.fang] 's point, as of now using this to determine 
where to schedule tasks to avoid spot instances and the like has really been 
superseded by Node Labels and I do not think we should add additional 
functionality for that here - Node Labels is really the way to handle that part 
of the usecase.  That leaves, potentially among other things, 
affinity/anti-affinity issues (not scheduling long running tasks 
together/scheduling them together) and log handling (how do we tell the system 
we want log handling for a long running service, if, in fact, the system needs 
to be told that).  I submit that it would be better to have separate solutions 
to each of these needs which can be bundled together to achieve the overall 
usecase, as I think that will provide better control without adding too much 
complexity for the end user.  Which means that we would break this out into 
affinity/anti-affinity and logging configuration.  We could always have a 
single parameter (like this one) which set's the others for convenience, I'm 
not sure we'll actually need it, but I do think that splitting out the bundled 
functionality into individual items (some of which may already be being worked 
on elsewhere) is the way to go.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-19 Thread Jian Fang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283147#comment-14283147
]

Jian Fang commented on YARN-1039:
-

The container request could specify what tags/labels it requires, right?
Tag/label is not really a resource but an attribute instead IMO, just like
short lived and long lived.

Even you could specify long-live or short-live, resource manager still needs to
translate that into something meaningful, right? Or do you say that you have
some specific logic in YARN to handle the long lived containers? If that is
true, then it is a different story. Could you please elaborate a bit more about
how long lived is defined in YARN and what kinds of specific handling there?
Thanks.

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-19 Thread Jian Fang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283074#comment-14283074
]

Jian Fang commented on YARN-1039:
-

The term long lived relies on resource manager to understand what long
lived means. How to define that in resource manager then? Do you still rely on
node managers to provide tags/labels and resource manager to understand them?
If that is true, shouldn't YARN-796 have already addressed this issue with a
more generic way to schedule containers based on tags/labels?

Personally, I think YARN-796 is more generic. Take the spot instance mentioned
here as an example, customers don't want to schedule AM containers on spot
instances as well, not just long lived tasks.

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283083#comment-14283083
 ] 

Hadoop QA commented on YARN-1039:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651791/YARN-1039.3.patch
  against trunk revision 4a44508.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6361//console

This message is automatically generated.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-01-19 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283096#comment-14283096
 ] 

Xuan Gong commented on YARN-1039:
-

[~jfeng]
bq. If that is true, shouldn't YARN-796 have already addressed this issue with 
a more generic way to schedule containers based on tags/labels?
Yes, it is. But for node labeling, that is for resource scheduling. I think 
that the application should also identify itself as long-live or short-live. If 
not, how the RM figure out which resources I should assign to this application 
? 

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-30 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048345#comment-14048345
 ] 

Carlo Curino commented on YARN-1039:


Hi Guys, I am just tuning in now... (apologies if I am misinterpreting the 
conversation), but it seems that some of the proposed changes resemble what we 
were proposing for the reservation YARN-1051 work. In the sub-task YARN-1708 we 
propose and extension of ResourceRequest that expresses the duration (or 
leaseDuration if you prefer) for which resources will be reserved... The same 
concept could be used here as a hint from the user on for how long I expect to 
hold onto the resources. What I am suggesting is that having a time 
associated with a ResourceRequest could serve both purposes, and be a generally 
useful hint to the RM.

Thoughts?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-25 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043238#comment-14043238
 ] 

Wangda Tan commented on YARN-1039:
--

bq. it must be set at application creation time and all containers of the app 
will be considered long lived. This is because the RM does not keep track of 
individual container requests.
I think [~vinodkv]'s suggestion makes more sense to me: 
https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14041652page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14041652
And as [~cwelch] mentioned, we don't need constraint if an app is long-lived 
that all its containers should be long-lived, it's better to leave this 
decision to app itself.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-24 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041778#comment-14041778
 ] 

Craig Welch commented on YARN-1039:
---

That's along the lines of what I was thinking after talking with [~xgong] and 
looking around a bit more

It sounds like we need to be able to ask the resource manager for a container 
for long-lived cases (not on a spot instance, for example), both when launching 
the AM container (in the ApplicationSubmissionContext) and when the 
ApplicationMaster wants to get a container later on (ResourceRequestProto).  
This is really a scheduling hint for the resource manager (in both cases)

We need to be able to mark an application as long running for other reasons 
(adjusting progress bar behavior, etc)  

We need to be able to tell the node manager that a container will be long 
running when it is launched (to adjust logging behavior, etc).  An application 
master may launch instances not like itself (some not long running when it is 
long running) - which it can, as the application master can specify whatever it 
wants to in the resourcerequestproto

I do think it would be good to keep the interface as consistent as possible, 
and should probably have at least a rough idea of the whole picture before 
making additions.

I suggest this:
An enum of scheduling constraints, initially only to include LONG_RUNNING, 
later would include affinity, etc, this is solely for node selection by the 
resource manager
A repeated field of this enum on the ResourceRequestProto and the 
ApplicationSubmissionContext, in both cases this is purely a constraint on 
where the container is placed (the application master in the latter case)  
Go ahead and use a tag [~zjshen] on the application submission to indicate that 
an application is long-running for purposes of display (things like the 
progress bar, etc) (that seems to be an appropriate use for application tags)
a boolean value on the ContainerLaunchContextProto to indicate it is 
long-running

There are some tradeoffs in this approach but I think it's good overall - 
All the variations we have identified are covered 
It is consistent in how it handles launching a long-running container for both 
the application master and other containers
It is also consistent with the approach to date wrt the application submission 
context and the resource request (where items needed for launching the 
application master container are added to the application submission context)
When other scheduler constraints relevant for an application master are 
introduced later the api will not need to change to accommodate them (other 
than adding them to the enum)
We reuse the application tag for display and other like purposes, and in 
general are adding the minimum necessary to cover the identified cases 

(I thought it was simplest to just use a boolean on the container launch 
context, in that case the behavior is one way or the other, and other 
scheduling constraints don't apply).

Thoughts?


 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-24 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041874#comment-14041874
]

Zhijie Shen commented on YARN-1039:
---

bq. I see. I'd assume that the service flag would imply long-lived, but maybe
they could be separated.

Just think it out loudly. Please correct me if I'm wrong or missing something.

service and long-lived overlap to some extent when describing an
application, as we usually think a service is going to run for a long time.
However, IMHO, service should not be the necessity for long-lived.
Theoretically, a MR job can be big enough to run for a long time as well. We
may want to differ the application with service from others by some of the
applications' native characteristics. For example, progress is not going to
make sense to the applications that are labeled service, while we still want
it for a MR job even if it runs for days, don't we? Moreover, service sounds
the application-level only property, and we won't mark a single container as a
service.

On the other hand, long-lived is used to mark an application that is supposed
to run for long time. However, it can only indicate the application is likely
to run for a long time, but can not guarantee it will actually. I'm wondering
if we really need to mark an application long-lived when submission. Is it
feasible to justify whether an application is long-lived by how much time it
has already spent in the cluster, and the long-lived application is going to
be handled properly in implicit way? For example, when we come to AM retry
opportunities (one issue for long-lived application), we can choose to
refresh the quota given the application is working well for a while. We don't
need to rely on long-lived label. The reason that I can think of why we must
has this label upfront is that some special treatments for the long-lived
application should start from the beginning.

Add parameter for YARN resource requests to indicate long lived
-

Key: YARN-1039
URL: https://issues.apache.org/jira/browse/YARN-1039
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-24 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041875#comment-14041875
 ] 

Zhijie Shen commented on YARN-1039:
---

Upgrade the jira to major given a long discussion.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-24 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042350#comment-14042350
 ] 

Anubhav Dhoot commented on YARN-1039:
-

The tokens for long lived applications jira is 
[YARN-941|https://issues.apache.org/jira/browse/YARN-941]

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-24 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042370#comment-14042370
]

Steve Loughran commented on YARN-1039:
--

bq. We need to have another flag to indicate that the containers requested by
an AM will be long lived, it must be set at application creation time and all
containers of the app will be considered long lived. This is because the RM
does not keep track of individual container requests.

I see this, but disagree as it doesn't meet all use cases. For different
requests we may want: long-lived, pre-emptible, anti-affine,

This can't go in requests -as you point out- but we already have a per-request
flag that really sets a bit in the priority level -the lax placement option.

If the other requests set the values at that priority then it is similar.

Even so, setting these values in a request is confusing -even today. It would
be better to have some operation to get/set the attributes of a priority for
requests.

This would be a bigger change...something we my not want to rush into.

Add parameter for YARN resource requests to indicate long lived
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-24 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042469#comment-14042469
 ] 

Craig Welch commented on YARN-1039:
---

{quote}We need to have a flag to indicate an AM is long lived.  We need to have 
another flag to indicate that the containers requested by an AM will be long 
lived,{quote}

I believe the proposal meets those needs, it is one particular way to do so...

{quote}it must be set at application creation time and all containers of the 
app will be considered long lived. This is because the RM does not keep track 
of individual container requests.{quote}

I'm not sure why it matters whether or not RM keeps track of the container 
requests - the AM will request containers with scheduling constraints like long 
lived, affinity, etc, and RM considers them when selecting nodes, after that 
completes it no longer necessarily matters.  If the AM needs a relationship 
between nodes for a request, or a particular type of selection (not on a spot 
node - long-running) it will make a request for those nodes, get nodes that 
meet it's needs, and it's good to go.  It sounds as though it would be more 
flexible / meet a wider set of usecases and therefore be preferable to allow an 
application master to obtain different types of containers for different 
purposes during it's lifetime as opposed to forcing to use only one set of 
container constraints throughout

{quote}Having a long enum of flag to indicated optional qualities of the 
requested containers has been discussed in the past (in the context of some 
JIRAs related to Llama) and it has been discarded as it would mean divergence 
on the features different schedulers support.{quote}

So, for this jira there is a desire to support selecting nodes with particular 
qualities (not placing a long running process on a temporary/spot instance), 
coming soon are other needs for other similar selection/constraint logic 
(affinity, anti-affinity, etc) - not being able to indicate qualities for the 
containers would keep us for being able to support those needs, and I believe 
there is a need to support this functionality.  It's 
filtering/constraint/selection logic and could probably be generalized in a way 
which could be used by various schedulers...

 

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041086#comment-14041086
 ] 

Craig Welch commented on YARN-1039:
---

[~ste...@apache.org] wrt the need for a container level flag / a way for the 
application master to launch long lived containers - definitely, but the idea 
was for that to come as a later step - although that may be short-sighted, as 
it may be better to come up with a common way to do this for the application 
master container and the containers it later launches now instead of ending up 
with unmatched approaches later...

This first step is to provide a way for the application master to be launched 
in a long lived container (generally, an application master for a long lived 
application will need to itself be launched in a long lived container - at 
least, it needs to be possible to do so) - which is why there needs to be some 
way to indicate the need for a long lived container during application 
submission (necessary but not sufficient overall...)

[~zjshen] I was also wondering about using the tags, but after talking with 
[~xgong] we are not thinking that is the way to go because tags don't seem to 
be about changing behavior but only about freeform way to enable 
search/display/etc.

After this discussion and some looking around it really seems that what we are 
after is a way to communicate a quality of the needed container to the resource 
manager both at application submission (for the application master container) 
and also for later container launches by the master, kind of like the 
ResourceProto, which is also already present in both cases for the same reason 
(I suggested adding it there, actually, as something necessary for the 
container but [~xgong] objected, thinking it is really specific to metric 
qualities (cpu, memory...).

I'm going to take a look at adding something alongside /similar to the 
ResourceProto to indicate constraints/requirements for the container, starting 
with long lived, that can be common to application submission and when the 
containers are started later by the application, not necessarily a long field 
for bit manipulation but something which is also extensible 


 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041105#comment-14041105
 ] 

Steve Loughran commented on YARN-1039:
--

I see. I'd assume that the service flag would imply long-lived, but maybe 
they could be separated.

I'd like to see a {{long}} enum of flags here as its easier to be forwards 
compatible

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041110#comment-14041110
 ] 

Craig Welch commented on YARN-1039:
---

The more I look around, the better I like the idea of adding it to the resource 
proto.  It is the same kind of thing as the items already in there - it's a 
characteristic required for the container (it isn't a metric style quality, but 
still, it's a characteristic of the resource needed) and it is already present 
everywhere the information is needed (at application submission and when 
containers are requested).  Adding something so similar alongside the resource 
proto seems unnecessary.  Do you agree with [~xgong]'s concerns or do you think 
it makes sense to add it there?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-23 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041652#comment-14041652
 ] 

Vinod Kumar Vavilapalli commented on YARN-1039:
---

I am not against a container/resource level definition of whether that 
container is long lived or not, but I think it is equally important to mark at 
the application level if _at least_ one container in the application is 
considered long lived. So, to summarize, how about
 - an app-level isLongRunning() that indicates _if at least one container of 
this application will be long-running_ and
 - a resource-request level isLongRunning() that indicates _if this container 
is long running or not_.

The app-level flag can help UIs, making very quick scheduling distinctions etc.

Thoughts?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-21 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039744#comment-14039744
 ] 

Varun Vasudev commented on YARN-1039:
-

I agree with [~zjshen]. Using the tags field also means we don't have to worry 
about switching to an enum like [~cwelch] mentioned in one of earlier comments.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-21 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039976#comment-14039976
]

Steve Loughran commented on YARN-1039:
--

# I'd make the long-lived flag a container request, *not the AM launch
request*. An AM may wish to indicate that some containers are shortlife, others
long-lived.
# If the tag approach lets my AM add this request while running with the 2.4
JARs -even though the hint will be ignored- I'm happy. Protobuf may be agile,
but the generated proto classes aren't, and working with fields directly is
hard to do, introspection brittle. I know that from working with the am restart
flag.
# Otherwise, I'd like a long64 with bits we can set and read. It's the
cross-platform way and would give us a single field for future additions

Add parameter for YARN resource requests to indicate long lived
-

Key: YARN-1039
URL: https://issues.apache.org/jira/browse/YARN-1039
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-21 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040001#comment-14040001
]

Zhijie Shen commented on YARN-1039:
---

bq. An AM may wish to indicate that some containers are shortlife, others
long-lived.

Container-level long-live flag is an interesting idea. Given any container of
an app is long-lived, the AM container is automatically going to be long-lived
as well, right? Suppose AM should last until the exit of the whole app. Shall
we mark an app long-lived, and then allow long-lived app to start a long-lived
container?

bq. If the tag approach lets my AM add this request while running with the 2.4
JARs even though the hint will be ignored I'm happy.

If the granularity is going to be container, the tag may not help, as it's an
application-level information

Add parameter for YARN resource requests to indicate long lived
-

Key: YARN-1039
URL: https://issues.apache.org/jira/browse/YARN-1039
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-20 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039152#comment-14039152
 ] 

Craig Welch commented on YARN-1039:
---

Not to make things overly complex, but we were talking about making this an 
enum rather than a simple boolean, with the notion that this is one of a number 
of possible scheduler hints we may want to support - with values like 
PERSISTENT, TRANSIENT, RELOCATABLE, etc (a single value or possibly a list/set 
of values, for cases which are not mutually exclusive).  Thoughts? 

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor

 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-20 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039518#comment-14039518
 ] 

Craig Welch commented on YARN-1039:
---

I went ahead and just added a boolean flag - there does seem to be room to 
generalize this in the future but at the moment it's not entirely clear to me 
how best to do that / that there are enough examples to do it properly.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039557#comment-14039557
 ] 

Hadoop QA commented on YARN-1039:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651767/YARN-1039.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/4035//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4035//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4035//console

This message is automatically generated.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039613#comment-14039613
 ] 

Hadoop QA commented on YARN-1039:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651782/YARN-1039.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4036//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4036//console

This message is automatically generated.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039648#comment-14039648
 ] 

Hadoop QA commented on YARN-1039:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651791/YARN-1039.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4038//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4038//console

This message is automatically generated.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-20 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039679#comment-14039679
 ] 

Zhijie Shen commented on YARN-1039:
---

Can we make use of the tag in the application submission context directly, 
instead of adding a dedicate field?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-19 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038041#comment-14038041
 ] 

Steve Loughran commented on YARN-1039:
--

marking as depended on  by YARN-896.

I would keep the affinity logic separate, as discussed in YARN-1042

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor

 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-19 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038248#comment-14038248
 ] 

Vinod Kumar Vavilapalli commented on YARN-1039:
---

For now, we can start with a parameter on the ApplicationSubmissionContext - we 
are still figuring out long-running services before delving into enabling a 
smaller subset of long-lived containers within a larger application..

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
Priority: Minor

 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-05-15 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998396#comment-13998396
 ] 

Xuan Gong commented on YARN-1039:
-

Start to work on it. Will provide a proposal soon.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Xuan Gong
Priority: Minor

 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

55 matches

Mail list logo