[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

Jason Lowe (JIRA) Wed, 27 Sep 2017 11:10:32 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183000#comment-16183000
 ]


Jason Lowe commented on YARN-7244:
----------------------------------

bq. Only potential issue which I see is that, once a set of dirs are pulled 
from LocalDirAllocator#ctx.localDirs, these dirs will be validated only when 
one more getLocalPathForWrite/Read is invoked. So there could be a window where 
we may get a stale dirs.

I wouldn't worry too much about that window.  Think of the much larger window a 
container gets, since it is only told once, on startup, what the list of valid 
dirs are.  I think we're fine as long as aux services are notified fairly soon 
after a disk fails.  It doesn't have to be instantaneous nor atomic.  We could 
make a pull API where the aux service can essentially directly call the NM's 
LocalDirHandlerService for getting a path to read or a path to write, then the 
aux service doesn't even have to manage the directories itself if all it cares 
about is finding a place to write or read.




> ShuffleHandler is not aware of disks that are added
> ---------------------------------------------------
>
>                 Key: YARN-7244
>                 URL: https://issues.apache.org/jira/browse/YARN-7244
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

Reply via email to