[ https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183000#comment-16183000 ]
Jason Lowe commented on YARN-7244: ---------------------------------- bq. Only potential issue which I see is that, once a set of dirs are pulled from LocalDirAllocator#ctx.localDirs, these dirs will be validated only when one more getLocalPathForWrite/Read is invoked. So there could be a window where we may get a stale dirs. I wouldn't worry too much about that window. Think of the much larger window a container gets, since it is only told once, on startup, what the list of valid dirs are. I think we're fine as long as aux services are notified fairly soon after a disk fails. It doesn't have to be instantaneous nor atomic. We could make a pull API where the aux service can essentially directly call the NM's LocalDirHandlerService for getting a path to read or a path to write, then the aux service doesn't even have to manage the directories itself if all it cares about is finding a place to write or read. > ShuffleHandler is not aware of disks that are added > --------------------------------------------------- > > Key: YARN-7244 > URL: https://issues.apache.org/jira/browse/YARN-7244 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: YARN-7244.001.patch, YARN-7244.002.patch > > > The ShuffleHandler permanently remembers the list of "good" disks on NM > startup. If disks later are added to the node then map tasks will start using > them but the ShuffleHandler will not be aware of them. The end result is that > the data cannot be shuffled from the node leading to fetch failures and > re-runs of the map tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org