[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-30 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754787#comment-13754787
 ] 

Robert Joseph Evans commented on YARN-896:
--

I agree that providing a good way handle stdout and stderr is important. I 
don't know if I want the NM to be doing this for us though, but that is an 
implementation detail that we can talk about on the follow up JIRA.  Chris, 
feel free to file a JIRA for rolling of stdout and stderr and we can look into 
what it will take to support that properly.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754812#comment-13754812
 ] 

Jason Lowe commented on YARN-896:
-

bq. Chris, feel free to file a JIRA for rolling of stdout and stderr and we can 
look into what it will take to support that properly.

[~ste...@apache.org] recently filed YARN-1104 as a subtask of this JIRA which 
covers the NM rolling stdout/stderr.  We can transmute that JIRA into whatever 
ends up rolling the logs if it's not the NM.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-19 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743819#comment-13743819
 ] 

Robert Joseph Evans commented on YARN-896:
--

[~criccomini],

That is a great point.  To do this we need the application to somehow inform 
YARN that it is a long lived application.  We could do this either through some 
sort of metadata that is submitted with the application to YARN, possibly 
through the service registry, or even perhaps just setting the progress to a 
special value like -1.  I think I would prefer the first one, because then YARN 
could use that metadata later on for other things.  After that the UI change 
should not be too difficult.  If you want to file a JIRA for it, either as a 
sub task or just link it in, that would be great.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744029#comment-13744029
 ] 

Steve Loughran commented on YARN-896:
-

Chris -I use the bar today as measure of expected nodes vs actual; i.e. what 
percentage of the goal of work has been met -which is free to vary up and down 
w/node failures -the percent bar is free to go in both directions

YARN-1039 already says add a flag to say long-lived, so that future versions 
of YARN can behave differently. This could do more than GUI -in particular 
YARN-3 cgroup limits would be something you may want to turn on for services, 
to exactly limit their RAM  CPU to what they asked for. If a long-lived 
service underestimates its requirements the impact on the node is worse than if 
a short-lived container does it -for that you may want to be more forgiving.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-19 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744061#comment-13744061
 ] 

Chris Riccomini commented on YARN-896:
--

[~stev...@iseran.com] I've linked the JIRAs as relates to. The progress 
behavior you're describing is somewhat reasonable, but a bit unintuitive. Still 
feels like a hack. If that's the route we want to go, we should change the UI 
accordingly. If you think YARN-1079 is a dupe, feel free to close and update 
YARN-1039 with UI notes.

Regarding CGroup limits, have a look at YARN-810. Might be related to what 
you're saying.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742830#comment-13742830
 ] 

Chris Riccomini commented on YARN-896:
--

Also, any idea what to do regarding long lived YARN processes (i.e. services 
that have no expected end) and the progress bar in YARN?

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734983#comment-13734983
 ] 

Robert Joseph Evans commented on YARN-896:
--

Sorry I have not responded sooner.  I have been out on vacation and had a high 
severity issue that has consumed a lot of my time.

[~lmccay] and [~thw] There are many different services that long lived 
processes need to communicate with.  Many of these services use tokens and 
others may not.  Each of these tokens or other credentials are specific to the 
services being accessed.  In some cases like with HBase we probably can take 
advantage of the existing renewal feature in the RM.  With other tokens or 
credentials it may be different, and may require AM specific support for them. 
I am not really that concerned with solving the renewal problem for all 
possible credentials here, although if we can solve this for a lot of common 
tokens at the same time that would be great. What I care most about is being 
sure that a long lived YARN application does not necessarily have to stop and 
restart because an HDFS token cannot be renewed any longer.  If there are 
changes going into the HDFS security model that would make YARN-941 unnecessary 
that is great.  I have not had much time to follow the security discussion so 
thank you for pointing this out.  But it is also a question of time frames.  
YARN-941 and YARN-1041 would allow for secure, robust, long lived applications 
on YARN, and do not appear to be that difficult to accomplish.  Do you know the 
time frame for the security rework?

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727476#comment-13727476
 ] 

Steve Loughran commented on YARN-896:
-

YARN-1011 - speculative containers- may be useful here too, you could have some 
speculative containers that may come and go alongside a set of static 
containers that have longer lifespans.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-02 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727680#comment-13727680
 ] 

Larry McCay commented on YARN-896:
--

While I am missing some of the important context of how tokens are issued for 
these long lived containers, I can introduce another pattern for token use that 
may be of some interest. 

If when an application is submitted to the RM it included tokens that represent 
the application's identity and have a sufficiently long expiration date then 
they could be exchanged for shorter lived access tokens. Upon completion or 
being flagged as rogue the identity token can be revoked/invalidated at which 
time the bearer could no longer acquire access tokens with it. This pattern 
eliminates the finite lifespan issue that tokens such as the delegation token 
have and at the same time reduces the amount of damage that can be done with an 
access token. This pattern is being discussed as part of the Hadoop SSO efforts 
for user authentication which you can find at HADOOP-9533 and HADOOP-9392. I 
have also filed a JIRA and have a preliminary patch posted for a JsonWebToken 
for use in such a pattern: HADOOP-9781. It utilizes PKI based cryptography for 
signing and verifying the token which is supported with a dependency on JIRA 
HADOOP-9534 for a credential management framework.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-30 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723968#comment-13723968
 ] 

Siddharth Seth commented on YARN-896:
-

bq. Robert Joseph Evans Applications may connect to other services such as 
HBase or issue tokens for communication between its own containers. All of 
these would require renewal.
The RM takes care of renewing tokens for HDFS - it can do this since the HDFS 
token renewer class is in the RM's classpath. For other applications - Hive for 
example - this isn't possible. I believe Hive ends up issuing tokens which are 
valid for a longer duration to get around the renewal problem. I won't 
necessarily link this to long running YARN though - other than the bit about 
the token max-age.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-23 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717572#comment-13717572
 ] 

Thomas Weise commented on YARN-896:
---

[~revans2] Applications may connect to other services such as HBase or issue 
tokens for communication between its own containers. All of these would require 
renewal.


 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-19 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713692#comment-13713692
 ] 

Robert Joseph Evans commented on YARN-896:
--

[~thw] I am not totally sure what you mean by app specific tokens.  Is this 
tokens that the app is going to use to connect to other services like HBase? or 
is it something else?

[~eric14] and [~enis] Rolling upgrades is a very interesting use case.  We can 
definitely add in a ticket to support this type of thing.  I agree that it 
needs to be thought through some, and is going to require help from both the AM 
and YARN to do it properly.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-19 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713992#comment-13713992
 ] 

Robert Joseph Evans commented on YARN-896:
--

I filed one new JIRA for updating tokens in the RM YARN-941.

I started to file a JIRA for the AM to be informed of the location of its 
already running containers, but as I was writing it I realized that it will not 
give us enough information to be able to reattach to the containers.  The only 
thing it will give us is enough info to be able to go shoot the containers.  
Simply because there is no metadata about what port the container may be 
listening on or anything like that.  It seems to me that we would be better off 
keeping a log, similar to the MR job history log, that has in it all the data 
the AM needs to look for running containers.  If others see a different need 
for this API, I am still happy to file a JIRA for it.

I have not filed a JIRA for anti-affinity yet either.  I seem to remember 
another JIRA for something like this already, but I have not found it yet. I 
figure we can add in a long lived process flag for the scheduler when we run 
across a use case for it.

The other parts discussed here, either already have a JIRA associated with the 
same functionality, or I think need a bit more discussion about exactly what we 
want to do.  Namely log aggregation/processing and Hadoop package 
management/rolling upgrades of live applications.

If I missed something please let me know.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-16 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709956#comment-13709956
 ] 

eric baldeschwieler commented on YARN-896:
--

IMO, you should be able to run a new framework / service simply by dropping a 
tarball / jar / war sort of thing into a well know place and pointing to it in 
your Job invocation.

I'm not sure what beyond this and the distributed cache Hoya would need to 
deploy HBase, but it would be great to get it to the point where you simply 
drop either just hoya package (that contains a version of HBase) or Hoya and a 
HBase tarball into HDFS.

Let's discuss and make a proposal.





 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-15 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709242#comment-13709242
 ] 

Thomas Weise commented on YARN-896:
---

We also identified the need for token renewal (app specific tokens). This 
should be a common need for long running services. Has it been discussed 
elsewhere?

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-12 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707647#comment-13707647
 ] 

Thomas Weise commented on YARN-896:
---

Bobby, thanks for putting this together. Some items from the DataTorrent wish 
list (most already covered above):
* gang scheduling (similar to 
[YARN-624|https://issues.apache.org/jira/browse/YARN-624?focusedCommentId=13662352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13662352])
* affinity, anti-affinity
* return resource requests that cannot be met
* attach restarted AM to existing containers
* service registry


 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-10 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704623#comment-13704623
 ] 

Robert Joseph Evans commented on YARN-896:
--

Chris, Yes I missed the app master retry issue.  Those two with the discussion 
on them seem to cover what we are looking for.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703622#comment-13703622
 ] 

Robert Joseph Evans commented on YARN-896:
--

No comments in the past few days.  I would like to hear from more people 
involved, even if it is just to say that it looks like we have everything 
covered here.  Then we can start filing JIRAs and getting some work done.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699314#comment-13699314
 ] 

Steve Loughran commented on YARN-896:
-

Based on our Hoya, HBase on YARN work:
* we need a restarted AM to be given the existing set of containers from its 
previous instance. The use case there is region servers should stay up while 
the AM and master are restarted.
* maybe: be able to warn YARN that the services will be long-lived. That could 
be used in scheduling and placement.
* anti-affinity is needed to declare that different container instances SHOULD 
be deployed on different nodes (use case: region servers). If failure domains 
are supported in the topology, anti-affinity should use that. I don't know if 
we'd want best-effort vs absolute requirements.
* add ability to increase requirements of running containers, e.g. say this 
service is using more RAM than expected, reduce the amount available to others.
* maybe: ability to send kill signals to container processes, to do a graceful 
kill before escalating. This is of limited value if an extra process (such as 
{{bin/hbase}}) intervenes in the startup process.

There's also long-lived service discovery, a topic for another JIRA


 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-02 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698500#comment-13698500
 ] 

Robert Joseph Evans commented on YARN-896:
--

During the most recent Hadoop Summit there was a developer meetup where we 
discussed some of these issues.  This is to summarize what was discussed at 
that meeting and to add in a few things that have also been discussed on 
mailing lists and other places.

HDFS delegation tokens have a maximum life time. Currently tokens submitted to 
the RM when the app master is launched will be renewed by the RM until the 
application finishes and the logs from the application have finished 
aggregating.  The only token currently used by the YARN framework is the HDFS 
delegation token.  This is used to read files from HDFS as part of the 
distributed cache and to write the aggregated logs out to HDFS.

In order to support relaunching an app master after the HDFS the maximum 
lifetime of the HDFS delegation token, we either need to allow for tokens that 
do not expire or provide an API to allow the RM to replace the old token with a 
new one.  Because removing the maximum lifetime of a token reduces the security 
of the cluster as a whole I think it would be better to provide an API to 
replace the token with a new one.

If we want to continue supporting log aggregation we also need to provide a way 
for the Node Managers to get the new token too.  It is assumed that each app 
master will also provide an API to get the new token so it can start using it.


Log aggregation is another issue, although not required for long lived 
applications to work.  Logs are aggregated into HDFS when the application 
finishes.  This is not really that useful for applications that are never 
intended to exit.  Ideally the processing of logs by the node manager should be 
pluggable so that clusters and applications can select how and when logs are 
processed/displayed to the end user.  Because many of these systems roll their 
logs to avoid filling up disks we will probably need a protocol of some sort 
for the container to communicate with the Node Manager when logs are ready to 
be processed.

Another issue is to allow containers to out live the app master that launched 
them and also to allow containers to outlive the node manager that launched 
them.  This is especially critical for the stability of applications durring 
rolling upgrades to YARN.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-02 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698505#comment-13698505
 ] 

Robert Joseph Evans commented on YARN-896:
--

Another issue that has been discussed in the past is the impact that long lived 
processes can have on resource scheduling. It is possible for a long lived 
process to grab lots of resources and then never release them even though it is 
using more resources than it would be allowed to have when the cluster is full. 
 Recent preemption changes should be able to prevent this from happening 
between different queues/pools, but we may need to think if we need more 
control about this within a queue.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira