[ https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182720#comment-16182720 ]
Sunil G commented on YARN-7244: ------------------------------- Thanks [~jlowe] for adding more clarity on this. 'pull' model may be better and could work for all such cases. As Jason suggested if apps could know the latest dirs from {{getLocalDirsForRead/Write}}, shuffle handler will have a list of valid dirs always. Only potential issue which I see is that, once a set of dirs are pulled from {{LocalDirAllocator#ctx.localDirs}}, these dirs will be validated only when one more getLocalPathForWrite/Read is invoked. So there could be a window where we may get a stale dirs. If new api {{LocalDirAllocator#getLocalDirsForRead}} could call {{confChanged}}, then i think it should be a source of truth for localDirs for given time snapshot. bq.Do you think, we can improve this to skip as default behavior itself Currently in this patch, you are trying to avoid disk validation check when shouldFilter is false. To add more context, may be we could skip this check here provided we have a valid dirs in ShuffleHandler end based on earlier api. > ShuffleHandler is not aware of disks that are added > --------------------------------------------------- > > Key: YARN-7244 > URL: https://issues.apache.org/jira/browse/YARN-7244 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: YARN-7244.001.patch, YARN-7244.002.patch > > > The ShuffleHandler permanently remembers the list of "good" disks on NM > startup. If disks later are added to the node then map tasks will start using > them but the ShuffleHandler will not be aware of them. The end result is that > the data cannot be shuffled from the node leading to fetch failures and > re-runs of the map tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org