[ https://issues.apache.org/jira/browse/YARN-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012544#comment-14012544 ]
Jason Lowe commented on YARN-2114: ---------------------------------- Currently a container can obtain a list of local directories to use by examining the LOCAL_DIRS environment variable. However these directories have an lifespan that matches the application (i.e.: they will only be deleted when the entire application completes). Therefore if a container writes some temporary data to this directory and the container crashes or it otherwise orphans that data, the data won't be cleaned up when the container completes but rather only when the entire application completes. There's use-cases for both: data that survives as long as the application is active and data that only survives as long as the container is active. Given the way YARN works today, a container can take the list of directories from LOCAL_DIRS and tack on the CONTAINER_ID to find these directories. However that might not be forward compatible unless we commit to that always working. It would be cleaner if there was a separate variable, maybe CONTAINER_LOCAL_DIRS, that listed the directories that container-specific rather than app-specific. > Inform container of container-specific local directories > -------------------------------------------------------- > > Key: YARN-2114 > URL: https://issues.apache.org/jira/browse/YARN-2114 > Project: Hadoop YARN > Issue Type: Improvement > Components: api > Affects Versions: 2.5.0 > Reporter: Jason Lowe > > It would be nice if a container could know which local directories it can use > for temporary data and those directories will be automatically cleaned up > when the container exits. The current working directory is one of those > directories, but it's tricky (and potentially not forward-compatible) to > determine the other directories to use on a multi-disk node. -- This message was sent by Atlassian JIRA (v6.2#6252)