[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754787#comment-13754787 ] Robert Joseph Evans commented on YARN-896: -- I agree that providing a good way handle stdout and stderr is important. I don't know if I want the NM to be doing this for us though, but that is an implementation detail that we can talk about on the follow up JIRA. Chris, feel free to file a JIRA for rolling of stdout and stderr and we can look into what it will take to support that properly. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754812#comment-13754812 ] Jason Lowe commented on YARN-896: - bq. Chris, feel free to file a JIRA for rolling of stdout and stderr and we can look into what it will take to support that properly. [~ste...@apache.org] recently filed YARN-1104 as a subtask of this JIRA which covers the NM rolling stdout/stderr. We can transmute that JIRA into whatever ends up rolling the logs if it's not the NM. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743819#comment-13743819 ] Robert Joseph Evans commented on YARN-896: -- [~criccomini], That is a great point. To do this we need the application to somehow inform YARN that it is a long lived application. We could do this either through some sort of metadata that is submitted with the application to YARN, possibly through the service registry, or even perhaps just setting the progress to a special value like -1. I think I would prefer the first one, because then YARN could use that metadata later on for other things. After that the UI change should not be too difficult. If you want to file a JIRA for it, either as a sub task or just link it in, that would be great. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744029#comment-13744029 ] Steve Loughran commented on YARN-896: - Chris -I use the bar today as measure of expected nodes vs actual; i.e. what percentage of the goal of work has been met -which is free to vary up and down w/node failures -the percent bar is free to go in both directions YARN-1039 already says add a flag to say long-lived, so that future versions of YARN can behave differently. This could do more than GUI -in particular YARN-3 cgroup limits would be something you may want to turn on for services, to exactly limit their RAM CPU to what they asked for. If a long-lived service underestimates its requirements the impact on the node is worse than if a short-lived container does it -for that you may want to be more forgiving. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744061#comment-13744061 ] Chris Riccomini commented on YARN-896: -- [~stev...@iseran.com] I've linked the JIRAs as relates to. The progress behavior you're describing is somewhat reasonable, but a bit unintuitive. Still feels like a hack. If that's the route we want to go, we should change the UI accordingly. If you think YARN-1079 is a dupe, feel free to close and update YARN-1039 with UI notes. Regarding CGroup limits, have a look at YARN-810. Might be related to what you're saying. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742830#comment-13742830 ] Chris Riccomini commented on YARN-896: -- Also, any idea what to do regarding long lived YARN processes (i.e. services that have no expected end) and the progress bar in YARN? Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734983#comment-13734983 ] Robert Joseph Evans commented on YARN-896: -- Sorry I have not responded sooner. I have been out on vacation and had a high severity issue that has consumed a lot of my time. [~lmccay] and [~thw] There are many different services that long lived processes need to communicate with. Many of these services use tokens and others may not. Each of these tokens or other credentials are specific to the services being accessed. In some cases like with HBase we probably can take advantage of the existing renewal feature in the RM. With other tokens or credentials it may be different, and may require AM specific support for them. I am not really that concerned with solving the renewal problem for all possible credentials here, although if we can solve this for a lot of common tokens at the same time that would be great. What I care most about is being sure that a long lived YARN application does not necessarily have to stop and restart because an HDFS token cannot be renewed any longer. If there are changes going into the HDFS security model that would make YARN-941 unnecessary that is great. I have not had much time to follow the security discussion so thank you for pointing this out. But it is also a question of time frames. YARN-941 and YARN-1041 would allow for secure, robust, long lived applications on YARN, and do not appear to be that difficult to accomplish. Do you know the time frame for the security rework? Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727476#comment-13727476 ] Steve Loughran commented on YARN-896: - YARN-1011 - speculative containers- may be useful here too, you could have some speculative containers that may come and go alongside a set of static containers that have longer lifespans. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727680#comment-13727680 ] Larry McCay commented on YARN-896: -- While I am missing some of the important context of how tokens are issued for these long lived containers, I can introduce another pattern for token use that may be of some interest. If when an application is submitted to the RM it included tokens that represent the application's identity and have a sufficiently long expiration date then they could be exchanged for shorter lived access tokens. Upon completion or being flagged as rogue the identity token can be revoked/invalidated at which time the bearer could no longer acquire access tokens with it. This pattern eliminates the finite lifespan issue that tokens such as the delegation token have and at the same time reduces the amount of damage that can be done with an access token. This pattern is being discussed as part of the Hadoop SSO efforts for user authentication which you can find at HADOOP-9533 and HADOOP-9392. I have also filed a JIRA and have a preliminary patch posted for a JsonWebToken for use in such a pattern: HADOOP-9781. It utilizes PKI based cryptography for signing and verifying the token which is supported with a dependency on JIRA HADOOP-9534 for a credential management framework. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723968#comment-13723968 ] Siddharth Seth commented on YARN-896: - bq. Robert Joseph Evans Applications may connect to other services such as HBase or issue tokens for communication between its own containers. All of these would require renewal. The RM takes care of renewing tokens for HDFS - it can do this since the HDFS token renewer class is in the RM's classpath. For other applications - Hive for example - this isn't possible. I believe Hive ends up issuing tokens which are valid for a longer duration to get around the renewal problem. I won't necessarily link this to long running YARN though - other than the bit about the token max-age. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717572#comment-13717572 ] Thomas Weise commented on YARN-896: --- [~revans2] Applications may connect to other services such as HBase or issue tokens for communication between its own containers. All of these would require renewal. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713692#comment-13713692 ] Robert Joseph Evans commented on YARN-896: -- [~thw] I am not totally sure what you mean by app specific tokens. Is this tokens that the app is going to use to connect to other services like HBase? or is it something else? [~eric14] and [~enis] Rolling upgrades is a very interesting use case. We can definitely add in a ticket to support this type of thing. I agree that it needs to be thought through some, and is going to require help from both the AM and YARN to do it properly. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713992#comment-13713992 ] Robert Joseph Evans commented on YARN-896: -- I filed one new JIRA for updating tokens in the RM YARN-941. I started to file a JIRA for the AM to be informed of the location of its already running containers, but as I was writing it I realized that it will not give us enough information to be able to reattach to the containers. The only thing it will give us is enough info to be able to go shoot the containers. Simply because there is no metadata about what port the container may be listening on or anything like that. It seems to me that we would be better off keeping a log, similar to the MR job history log, that has in it all the data the AM needs to look for running containers. If others see a different need for this API, I am still happy to file a JIRA for it. I have not filed a JIRA for anti-affinity yet either. I seem to remember another JIRA for something like this already, but I have not found it yet. I figure we can add in a long lived process flag for the scheduler when we run across a use case for it. The other parts discussed here, either already have a JIRA associated with the same functionality, or I think need a bit more discussion about exactly what we want to do. Namely log aggregation/processing and Hadoop package management/rolling upgrades of live applications. If I missed something please let me know. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709956#comment-13709956 ] eric baldeschwieler commented on YARN-896: -- IMO, you should be able to run a new framework / service simply by dropping a tarball / jar / war sort of thing into a well know place and pointing to it in your Job invocation. I'm not sure what beyond this and the distributed cache Hoya would need to deploy HBase, but it would be great to get it to the point where you simply drop either just hoya package (that contains a version of HBase) or Hoya and a HBase tarball into HDFS. Let's discuss and make a proposal. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709242#comment-13709242 ] Thomas Weise commented on YARN-896: --- We also identified the need for token renewal (app specific tokens). This should be a common need for long running services. Has it been discussed elsewhere? Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707647#comment-13707647 ] Thomas Weise commented on YARN-896: --- Bobby, thanks for putting this together. Some items from the DataTorrent wish list (most already covered above): * gang scheduling (similar to [YARN-624|https://issues.apache.org/jira/browse/YARN-624?focusedCommentId=13662352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13662352]) * affinity, anti-affinity * return resource requests that cannot be met * attach restarted AM to existing containers * service registry Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704623#comment-13704623 ] Robert Joseph Evans commented on YARN-896: -- Chris, Yes I missed the app master retry issue. Those two with the discussion on them seem to cover what we are looking for. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703622#comment-13703622 ] Robert Joseph Evans commented on YARN-896: -- No comments in the past few days. I would like to hear from more people involved, even if it is just to say that it looks like we have everything covered here. Then we can start filing JIRAs and getting some work done. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699314#comment-13699314 ] Steve Loughran commented on YARN-896: - Based on our Hoya, HBase on YARN work: * we need a restarted AM to be given the existing set of containers from its previous instance. The use case there is region servers should stay up while the AM and master are restarted. * maybe: be able to warn YARN that the services will be long-lived. That could be used in scheduling and placement. * anti-affinity is needed to declare that different container instances SHOULD be deployed on different nodes (use case: region servers). If failure domains are supported in the topology, anti-affinity should use that. I don't know if we'd want best-effort vs absolute requirements. * add ability to increase requirements of running containers, e.g. say this service is using more RAM than expected, reduce the amount available to others. * maybe: ability to send kill signals to container processes, to do a graceful kill before escalating. This is of limited value if an extra process (such as {{bin/hbase}}) intervenes in the startup process. There's also long-lived service discovery, a topic for another JIRA Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698500#comment-13698500 ] Robert Joseph Evans commented on YARN-896: -- During the most recent Hadoop Summit there was a developer meetup where we discussed some of these issues. This is to summarize what was discussed at that meeting and to add in a few things that have also been discussed on mailing lists and other places. HDFS delegation tokens have a maximum life time. Currently tokens submitted to the RM when the app master is launched will be renewed by the RM until the application finishes and the logs from the application have finished aggregating. The only token currently used by the YARN framework is the HDFS delegation token. This is used to read files from HDFS as part of the distributed cache and to write the aggregated logs out to HDFS. In order to support relaunching an app master after the HDFS the maximum lifetime of the HDFS delegation token, we either need to allow for tokens that do not expire or provide an API to allow the RM to replace the old token with a new one. Because removing the maximum lifetime of a token reduces the security of the cluster as a whole I think it would be better to provide an API to replace the token with a new one. If we want to continue supporting log aggregation we also need to provide a way for the Node Managers to get the new token too. It is assumed that each app master will also provide an API to get the new token so it can start using it. Log aggregation is another issue, although not required for long lived applications to work. Logs are aggregated into HDFS when the application finishes. This is not really that useful for applications that are never intended to exit. Ideally the processing of logs by the node manager should be pluggable so that clusters and applications can select how and when logs are processed/displayed to the end user. Because many of these systems roll their logs to avoid filling up disks we will probably need a protocol of some sort for the container to communicate with the Node Manager when logs are ready to be processed. Another issue is to allow containers to out live the app master that launched them and also to allow containers to outlive the node manager that launched them. This is especially critical for the stability of applications durring rolling upgrades to YARN. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698505#comment-13698505 ] Robert Joseph Evans commented on YARN-896: -- Another issue that has been discussed in the past is the impact that long lived processes can have on resource scheduling. It is possible for a long lived process to grab lots of resources and then never release them even though it is using more resources than it would be allowed to have when the cluster is full. Recent preemption changes should be able to prevent this from happening between different queues/pools, but we may need to think if we need more control about this within a queue. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira