[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698500#comment-13698500
 ] 

Robert Joseph Evans commented on YARN-896:
------------------------------------------

During the most recent Hadoop Summit there was a developer meetup where we 
discussed some of these issues.  This is to summarize what was discussed at 
that meeting and to add in a few things that have also been discussed on 
mailing lists and other places.

HDFS delegation tokens have a maximum life time. Currently tokens submitted to 
the RM when the app master is launched will be renewed by the RM until the 
application finishes and the logs from the application have finished 
aggregating.  The only token currently used by the YARN framework is the HDFS 
delegation token.  This is used to read files from HDFS as part of the 
distributed cache and to write the aggregated logs out to HDFS.

In order to support relaunching an app master after the HDFS the maximum 
lifetime of the HDFS delegation token, we either need to allow for tokens that 
do not expire or provide an API to allow the RM to replace the old token with a 
new one.  Because removing the maximum lifetime of a token reduces the security 
of the cluster as a whole I think it would be better to provide an API to 
replace the token with a new one.

If we want to continue supporting log aggregation we also need to provide a way 
for the Node Managers to get the new token too.  It is assumed that each app 
master will also provide an API to get the new token so it can start using it.


Log aggregation is another issue, although not required for long lived 
applications to work.  Logs are aggregated into HDFS when the application 
finishes.  This is not really that useful for applications that are never 
intended to exit.  Ideally the processing of logs by the node manager should be 
pluggable so that clusters and applications can select how and when logs are 
processed/displayed to the end user.  Because many of these systems roll their 
logs to avoid filling up disks we will probably need a protocol of some sort 
for the container to communicate with the Node Manager when logs are ready to 
be processed.

Another issue is to allow containers to out live the app master that launched 
them and also to allow containers to outlive the node manager that launched 
them.  This is especially critical for the stability of applications durring 
rolling upgrades to YARN.
                
> Roll up for long lived YARN
> ---------------------------
>
>                 Key: YARN-896
>                 URL: https://issues.apache.org/jira/browse/YARN-896
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Robert Joseph Evans
>
> YARN is intended to be general purpose, but it is missing some features to be 
> able to truly support long lived applications and long lived containers.
> This ticket is intended to
>  # discuss what is needed to support long lived processes
>  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to