[ 
https://issues.apache.org/jira/browse/YARN-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012544#comment-14012544
 ] 

Jason Lowe commented on YARN-2114:
----------------------------------

Currently a container can obtain a list of local directories to use by 
examining the LOCAL_DIRS environment variable.  However these directories have 
an lifespan that matches the application (i.e.: they will only be deleted when 
the entire application completes).  Therefore if a container writes some 
temporary data to this directory and the container crashes or it otherwise 
orphans that data, the data won't be cleaned up when the container completes 
but rather only when the entire application completes.  There's use-cases for 
both: data that survives as long as the application is active and data that 
only survives as long as the container is active.

Given the way YARN works today, a container can take the list of directories 
from LOCAL_DIRS and tack on the CONTAINER_ID to find these directories.  
However that might not be forward compatible unless we commit to that always 
working.  It would be cleaner if there was a separate variable, maybe 
CONTAINER_LOCAL_DIRS, that listed the directories that container-specific 
rather than app-specific.

> Inform container of container-specific local directories
> --------------------------------------------------------
>
>                 Key: YARN-2114
>                 URL: https://issues.apache.org/jira/browse/YARN-2114
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>
> It would be nice if a container could know which local directories it can use 
> for temporary data and those directories will be automatically cleaned up 
> when the container exits.  The current working directory is one of those 
> directories, but it's tricky (and potentially not forward-compatible) to 
> determine the other directories to use on a multi-disk node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to