[
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184319#comment-16184319
]
Jason Lowe commented on YARN-7244:
----------------------------------
bq. rather a new api as you mentioned in LocalDirAllocator named
getLocalDirsForRead could enough to get valid dirs as it pulls all configured
NM_LOCAL_DIRS and validates same.
LocalDirAllocator should not have the new API, IMHO. That class is in
hadoop-common and shouldn't be involved in solving this nodemanager-specific
problem.
I'm thinking we go with the pull approach with something like the following.
Note that I'm not stuck on the specific names of new interfaces/classes,
they're just examples for reference.
# In AuxiliaryService add new methods to get and set the API object to interact
with the NM's local dirs management, e.g.:
{code}
public AuxiliaryLocalPathHandler getAuxiliaryLocalPathHandler();
public void setAuxiliaryLocalPathHandler(AuxiliaryLocalPathHandler);
{code}
# The new AuxiliaryLocalPathHandler object would be in hadoop-yarn-api and look
something like this:
{code}
public interface AuxiliaryLocalPathHandler {
Path getLocalPathForRead(String);
Path getLocalPathForWrite(String);
Path getLocalPathForWrite(String, long);
}
{code}
# AuxiliaryService would implement a LocalDirsHandler that maps the
AuxiliarlyLocalDirsHandler calls to the NMs LocalDirsHandlerService.
# The ShuffleHandler can leverage the new AuxiliaryLocalPathHandler to find
shuffle input files rather than manage its own LocalDirAllocator.
> ShuffleHandler is not aware of disks that are added
> ---------------------------------------------------
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM
> startup. If disks later are added to the node then map tasks will start using
> them but the ShuffleHandler will not be aware of them. The end result is that
> the data cannot be shuffled from the node leading to fetch failures and
> re-runs of the map tasks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]